Deployments API
Deployments serve trained models as real-time prediction endpoints. The MLOps lifecycle supports staging and production stages, promotion, rollback, inference logging, and data drift detection.
All deployment endpoints are prefixed with /api/mlops.
Create Deployment
POST /api/mlops/projects/{project_id}/deployments
Deploy a model to either staging or production. If deploying to
production, any existing production deployment in the same project
is automatically archived.
Path Parameters
Parameter |
Type |
Description |
|---|---|---|
|
UUID |
Project that owns the model. |
Request Body
Field |
Type |
Required |
Description |
|---|---|---|---|
|
string |
Yes |
UUID of the model to deploy. Must belong to the same project. |
|
string |
Yes |
Deployment name. |
|
string |
Yes |
|
|
integer |
No |
Traffic allocation percentage (1–100, default 100). |
|
string |
No |
Attach a privacy policy for runtime anonymization. |
|
boolean |
No |
Anonymize PII in inference logs. |
|
boolean |
No |
Anonymize PII in prediction responses. |
|
object |
No |
Map input column names to PII types. |
|
float |
No |
Confidence threshold for PII detection (0.0–1.0). |
Example
curl -X POST "$BASE_URL/api/mlops/projects/d4e5f6a7-b8c9-0123-def4-567890123456/deployments" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model_id": "a7b8c9d0-e1f2-3456-7890-abcdef123456",
"name": "fraud-detector-staging",
"stage": "staging",
"traffic_percent": 100
}'
import requests
resp = requests.post(
f"{BASE_URL}/api/mlops/projects/d4e5f6a7-b8c9-0123-def4-567890123456/deployments",
headers={"Authorization": "Bearer YOUR_API_KEY"},
json={
"model_id": "a7b8c9d0-e1f2-3456-7890-abcdef123456",
"name": "fraud-detector-staging",
"stage": "staging",
"traffic_percent": 100,
},
)
deployment_id = resp.json()["deployment_id"]
print("Deployed:", deployment_id)
Response 201 Created
{
"id": "d0e1f2a3-b4c5-6789-0123-def456789012",
"deployment_id": "d0e1f2a3-b4c5-6789-0123-def456789012",
"stage": "staging",
"is_active": true
}
List Deployments
GET /api/mlops/projects/{project_id}/deployments
Return all deployments for a project.
Query Parameters
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
integer |
50 |
Max items (1–200). |
|
integer |
0 |
Pagination offset. |
Example
curl "$BASE_URL/api/mlops/projects/d4e5f6a7-b8c9-0123-def4-567890123456/deployments" \
-H "Authorization: Bearer YOUR_API_KEY"
resp = requests.get(
f"{BASE_URL}/api/mlops/projects/d4e5f6a7-b8c9-0123-def4-567890123456/deployments",
headers={"Authorization": "Bearer YOUR_API_KEY"},
)
for dep in resp.json()["items"]:
print(f"{dep['id']} stage={dep['stage']} active={dep['is_active']}")
Response 200 OK
{
"items": [
{
"id": "d0e1f2a3-b4c5-6789-0123-def456789012",
"model_id": "a7b8c9d0-e1f2-3456-7890-abcdef123456",
"stage": "staging",
"is_active": true,
"config": {"name": "fraud-detector-staging", "traffic_percent": 100},
"created_at": "2026-02-25T09:00:00Z"
}
],
"total": 1,
"limit": 50,
"offset": 0
}
Get Deployment Detail
GET /api/mlops/deployments/{deployment_id}
Return deployment metadata and the associated model details.
Example
curl "$BASE_URL/api/mlops/deployments/d0e1f2a3-b4c5-6789-0123-def456789012" \
-H "Authorization: Bearer YOUR_API_KEY"
Response 200 OK
{
"deployment": {
"id": "d0e1f2a3-b4c5-6789-0123-def456789012",
"project_id": "d4e5f6a7-b8c9-0123-def4-567890123456",
"model_id": "a7b8c9d0-e1f2-3456-7890-abcdef123456",
"stage": "staging",
"is_active": true,
"config": {"name": "fraud-detector-staging", "traffic_percent": 100},
"created_at": "2026-02-25T09:00:00Z"
},
"model": {
"id": "a7b8c9d0-e1f2-3456-7890-abcdef123456",
"name": "GBM_1_AutoML",
"algo": "GBM"
}
}
Predict via Deployment
POST /api/mlops/deployments/{deployment_id}/predict
Send input features and receive predictions from the deployed model. This is the primary real-time prediction endpoint. Inference requests are logged automatically for monitoring and drift detection.
Request Body
Field |
Type |
Required |
Description |
|---|---|---|---|
|
dict or list[dict] |
Yes |
Feature values. Single record or batch. |
|
object |
No |
Options (see table below). |
Predict Options
Key |
Type |
Description |
|---|---|---|
|
boolean |
Include SHAP-style feature contributions (not available for
StackedEnsemble models). Default: |
|
boolean |
Store input payload in inference logs. Default: |
|
string |
Override the deployment-level privacy policy for this request. |
|
boolean |
Override log anonymization for this request. |
|
boolean |
Override response anonymization for this request. |
Example
curl -X POST "$BASE_URL/api/mlops/deployments/d0e1f2a3-b4c5-6789-0123-def456789012/predict" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"inputs": {"amount": 29.99, "merchant_id": 42, "hour": 14},
"options": {"include_contributions": true}
}'
resp = requests.post(
f"{BASE_URL}/api/mlops/deployments/d0e1f2a3-b4c5-6789-0123-def456789012/predict",
headers={"Authorization": "Bearer YOUR_API_KEY"},
json={
"inputs": {"amount": 29.99, "merchant_id": 42, "hour": 14},
"options": {"include_contributions": True},
},
)
result = resp.json()
print("Prediction:", result["predictions"][0]["prediction"])
Response 200 OK
{
"predictions": [
{"prediction": "0"}
],
"contributions": [
[
{"feature": "amount", "value": -0.32},
{"feature": "merchant_id", "value": -0.15},
{"feature": "BiasTerm", "value": -1.20}
]
],
"latency_ms": 45
}
Promote Deployment
POST /api/mlops/deployments/{deployment_id}/promote
Promote a deployment to production. Any existing production deployment in the same project is automatically archived.
Request Body
Field |
Type |
Required |
Description |
|---|---|---|---|
|
string |
Yes |
Must be |
Example
curl -X POST "$BASE_URL/api/mlops/deployments/d0e1f2a3-b4c5-6789-0123-def456789012/promote" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"to_stage": "production"}'
requests.post(
f"{BASE_URL}/api/mlops/deployments/d0e1f2a3-b4c5-6789-0123-def456789012/promote",
headers={"Authorization": "Bearer YOUR_API_KEY"},
json={"to_stage": "production"},
)
Response 200 OK
{
"ok": true,
"deployment_id": "d0e1f2a3-b4c5-6789-0123-def456789012",
"stage": "production"
}
Rollback Deployment
POST /api/mlops/deployments/{deployment_id}/rollback
Rollback production to a previous deployment or model. Provide either
to_deployment_id (reactivate an archived deployment) or
to_model_id (create a new production deployment for the given model).
Request Body
Field |
Type |
Required |
Description |
|---|---|---|---|
|
string |
No* |
UUID of the archived deployment to reactivate. |
|
string |
No* |
UUID of the model to deploy as the new production version. |
* At least one of to_deployment_id or to_model_id is required.
Example
curl -X POST "$BASE_URL/api/mlops/deployments/d0e1f2a3-b4c5-6789-0123-def456789012/rollback" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"to_model_id": "a7b8c9d0-e1f2-3456-7890-abcdef123456"}'
Response 200 OK
{
"ok": true
}
Deactivate Deployment
POST /api/mlops/deployments/{deployment_id}/deactivate
Deactivate a deployment without promoting a replacement. The deployment
moves to archived stage.
Example
curl -X POST "$BASE_URL/api/mlops/deployments/d0e1f2a3-b4c5-6789-0123-def456789012/deactivate" \
-H "Authorization: Bearer YOUR_API_KEY"
Response 200 OK
{
"ok": true
}
Drift Metrics
GET /api/mlops/deployments/{deployment_id}/drift
Return drift detection results comparing the training distribution to recent inference data. Returns per-feature PSI scores and overall drift status.
Example
curl "$BASE_URL/api/mlops/deployments/d0e1f2a3-b4c5-6789-0123-def456789012/drift" \
-H "Authorization: Bearer YOUR_API_KEY"
resp = requests.get(
f"{BASE_URL}/api/mlops/deployments/d0e1f2a3-b4c5-6789-0123-def456789012/drift",
headers={"Authorization": "Bearer YOUR_API_KEY"},
)
drift = resp.json()
for feature, info in drift.get("features", {}).items():
print(f"{feature}: PSI={info['psi']:.4f} ({info['status']})")
Response 200 OK
{
"deployment_id": "d0e1f2a3-b4c5-6789-0123-def456789012",
"overall_status": "low",
"features": {
"amount": {"psi": 0.023, "status": "low"},
"merchant_id": {"psi": 0.15, "status": "moderate"},
"hour": {"psi": 0.008, "status": "low"}
},
"computed_at": "2026-02-28T12:00:00Z",
"inference_rows": 5000
}
Run Drift Detection
POST /api/mlops/deployments/{deployment_id}/drift/run
Enqueue a background job to compute drift metrics against recent inference data. The result can be retrieved via the drift metrics endpoint once the job completes.
Example
curl -X POST "$BASE_URL/api/mlops/deployments/d0e1f2a3-b4c5-6789-0123-def456789012/drift/run" \
-H "Authorization: Bearer YOUR_API_KEY"
Response 200 OK
{
"ok": true,
"job_id": "e1f2a3b4-c5d6-7890-1234-ef5678901234"
}
Inference Logs
GET /api/mlops/deployments/{deployment_id}/inference-logs
Return a paginated list of inference requests and responses for the deployment. Useful for monitoring, auditing, and debugging.
Query Parameters
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
integer |
50 |
Max items (1–500). |
|
integer |
0 |
Pagination offset. |
|
string |
– |
ISO 8601 timestamp. Only return logs after this time. |
|
string |
– |
ISO 8601 timestamp. Only return logs before this time. |
Example
curl "$BASE_URL/api/mlops/deployments/d0e1f2a3-b4c5-6789-0123-def456789012/inference-logs?limit=5" \
-H "Authorization: Bearer YOUR_API_KEY"
Response 200 OK
{
"items": [
{
"id": "f2a3b4c5-d6e7-8901-2345-f67890123456",
"deployment_id": "d0e1f2a3-b4c5-6789-0123-def456789012",
"inputs": {"amount": 29.99, "merchant_id": 42, "hour": 14},
"prediction": "0",
"latency_ms": 45,
"status_code": 200,
"created_at": "2026-02-28T14:30:00Z"
}
],
"total": 5000,
"limit": 5,
"offset": 0
}
Deployment Stages
Stage |
Description |
|---|---|
|
Pre-production testing. Use this to validate model behavior before going live. |
|
Live serving. Predictions from this deployment are production traffic. |
Deployment Status Values
Status |
Description |
|---|---|
|
Deployment is running and serving predictions. |
|
Deployment has been deactivated. Not serving predictions. |
|
Deployment failed to initialize or load the model. |
Alert Metrics Reference
Metric |
Description |
|---|---|
|
Population Stability Index. Measures feature distribution drift between training and production data. |
|
Degradation in prediction accuracy over time. |
|
Percentage of predictions returning errors. |
|
99th percentile prediction latency in milliseconds. |
|
Days since the model was last retrained. |
|
Anomalous prediction patterns detected. |
Alert Condition Operators
Operator |
Symbol |
Description |
|---|---|---|
|
> |
Greater than threshold. |
|
< |
Less than threshold. |
|
>= |
Greater than or equal to threshold. |
|
<= |
Less than or equal to threshold. |
|
== |
Equal to threshold. |
|
!= |
Not equal to threshold. |
Alert Severity Levels
Severity |
Description |
|---|---|
|
Informational alert. No action required. |
|
Warning. Should be investigated. |
|
Critical issue requiring immediate attention. |
Notification Channels
Channel |
Description |
|---|---|
|
Send alert to one or more email addresses. |
|
Post alert to a Slack channel via webhook. |
|
Send alert payload to a custom HTTP endpoint. |
Alert Rule Options
Option |
Type |
Description |
|---|---|---|
|
integer |
Minimum minutes between repeated alerts (1–1440, default 60). |
|
integer |
Minutes before escalating severity (optional, 1–1440). |
|
string |
Severity to escalate to: |
Deployment Lifecycle Example
The following shows the full lifecycle: create a staging deployment, test it, promote to production, and rollback if needed.
import requests
BASE_URL = "http://localhost:8888"
HEADERS = {"Authorization": "Bearer YOUR_API_KEY"}
PROJECT_ID = "d4e5f6a7-b8c9-0123-def4-567890123456"
MODEL_ID = "a7b8c9d0-e1f2-3456-7890-abcdef123456"
# 1. Deploy to staging
resp = requests.post(
f"{BASE_URL}/api/mlops/projects/{PROJECT_ID}/deployments",
headers=HEADERS,
json={"model_id": MODEL_ID, "name": "v2-staging", "stage": "staging"},
)
deployment_id = resp.json()["deployment_id"]
# 2. Test with a sample prediction
resp = requests.post(
f"{BASE_URL}/api/mlops/deployments/{deployment_id}/predict",
headers=HEADERS,
json={"inputs": {"amount": 50.0, "merchant_id": 10, "hour": 12}},
)
assert resp.status_code == 200
print("Staging test passed:", resp.json()["predictions"])
# 3. Promote to production
requests.post(
f"{BASE_URL}/api/mlops/deployments/{deployment_id}/promote",
headers=HEADERS,
json={"to_stage": "production"},
)
print("Promoted to production")
# 4. Later: rollback if needed
# requests.post(
# f"{BASE_URL}/api/mlops/deployments/{deployment_id}/rollback",
# headers=HEADERS,
# json={"to_model_id": OLD_MODEL_ID},
# )
See also
Models API – Model details and direct prediction.
ML Studio (What-If) API – What-if analysis against a deployment.
Reports API – Generate deployment reports.