Deployments API

Deployments serve trained models as real-time prediction endpoints. The MLOps lifecycle supports staging and production stages, promotion, rollback, inference logging, and data drift detection.

All deployment endpoints are prefixed with /api/mlops.

Create Deployment 

POST /api/mlops/projects/{project_id}/deployments

Deploy a model to either staging or production. If deploying to production, any existing production deployment in the same project is automatically archived.

Path Parameters

Parameter	Type	Description
`project_id`	UUID	Project that owns the model.

Request Body

Field	Type	Required	Description
`model_id`	string	Yes	UUID of the model to deploy. Must belong to the same project.
`name`	string	Yes	Deployment name.
`stage`	string	Yes	`staging` or `production`.
`traffic_percent`	integer	No	Traffic allocation percentage (1–100, default 100).
`privacy_policy_id`	string	No	Attach a privacy policy for runtime anonymization.
`privacy_anonymize_logs`	boolean	No	Anonymize PII in inference logs.
`privacy_anonymize_response`	boolean	No	Anonymize PII in prediction responses.
`privacy_column_map`	object	No	Map input column names to PII types.
`privacy_threshold`	float	No	Confidence threshold for PII detection (0.0–1.0).

Example

curl -X POST "$BASE_URL/api/mlops/projects/d4e5f6a7-b8c9-0123-def4-567890123456/deployments" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model_id": "a7b8c9d0-e1f2-3456-7890-abcdef123456",
    "name": "fraud-detector-staging",
    "stage": "staging",
    "traffic_percent": 100
  }'

import requests

resp = requests.post(
    f"{BASE_URL}/api/mlops/projects/d4e5f6a7-b8c9-0123-def4-567890123456/deployments",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    json={
        "model_id": "a7b8c9d0-e1f2-3456-7890-abcdef123456",
        "name": "fraud-detector-staging",
        "stage": "staging",
        "traffic_percent": 100,
    },
)
deployment_id = resp.json()["deployment_id"]
print("Deployed:", deployment_id)

Response 201 Created

{
  "id": "d0e1f2a3-b4c5-6789-0123-def456789012",
  "deployment_id": "d0e1f2a3-b4c5-6789-0123-def456789012",
  "stage": "staging",
  "is_active": true
}

List Deployments 

GET /api/mlops/projects/{project_id}/deployments

Return all deployments for a project.

Query Parameters

Parameter	Type	Default	Description
`limit`	integer	50	Max items (1–200).
`offset`	integer	0	Pagination offset.

Example

curl "$BASE_URL/api/mlops/projects/d4e5f6a7-b8c9-0123-def4-567890123456/deployments" \
  -H "Authorization: Bearer YOUR_API_KEY"

resp = requests.get(
    f"{BASE_URL}/api/mlops/projects/d4e5f6a7-b8c9-0123-def4-567890123456/deployments",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
)
for dep in resp.json()["items"]:
    print(f"{dep['id']}  stage={dep['stage']}  active={dep['is_active']}")

Response 200 OK

{
  "items": [
    {
      "id": "d0e1f2a3-b4c5-6789-0123-def456789012",
      "model_id": "a7b8c9d0-e1f2-3456-7890-abcdef123456",
      "stage": "staging",
      "is_active": true,
      "config": {"name": "fraud-detector-staging", "traffic_percent": 100},
      "created_at": "2026-02-25T09:00:00Z"
    }
  ],
  "total": 1,
  "limit": 50,
  "offset": 0
}

Get Deployment Detail 

GET /api/mlops/deployments/{deployment_id}

Return deployment metadata and the associated model details.

Example

curl "$BASE_URL/api/mlops/deployments/d0e1f2a3-b4c5-6789-0123-def456789012" \
  -H "Authorization: Bearer YOUR_API_KEY"

Response 200 OK

{
  "deployment": {
    "id": "d0e1f2a3-b4c5-6789-0123-def456789012",
    "project_id": "d4e5f6a7-b8c9-0123-def4-567890123456",
    "model_id": "a7b8c9d0-e1f2-3456-7890-abcdef123456",
    "stage": "staging",
    "is_active": true,
    "config": {"name": "fraud-detector-staging", "traffic_percent": 100},
    "created_at": "2026-02-25T09:00:00Z"
  },
  "model": {
    "id": "a7b8c9d0-e1f2-3456-7890-abcdef123456",
    "name": "GBM_1_AutoML",
    "algo": "GBM"
  }
}

Predict via Deployment 

POST /api/mlops/deployments/{deployment_id}/predict

Send input features and receive predictions from the deployed model. This is the primary real-time prediction endpoint. Inference requests are logged automatically for monitoring and drift detection.

Request Body

Field	Type	Required	Description
`inputs`	dict or list[dict]	Yes	Feature values. Single record or batch.
`options`	object	No	Options (see table below).

Predict Options

Key	Type	Description
`include_contributions`	boolean	Include SHAP-style feature contributions (not available for StackedEnsemble models). Default: `false`.
`store_payload`	boolean	Store input payload in inference logs. Default: `true`.
`privacy_policy_id`	string	Override the deployment-level privacy policy for this request.
`anonymize_logs`	boolean	Override log anonymization for this request.
`anonymize_response`	boolean	Override response anonymization for this request.

Example

curl -X POST "$BASE_URL/api/mlops/deployments/d0e1f2a3-b4c5-6789-0123-def456789012/predict" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "inputs": {"amount": 29.99, "merchant_id": 42, "hour": 14},
    "options": {"include_contributions": true}
  }'

resp = requests.post(
    f"{BASE_URL}/api/mlops/deployments/d0e1f2a3-b4c5-6789-0123-def456789012/predict",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    json={
        "inputs": {"amount": 29.99, "merchant_id": 42, "hour": 14},
        "options": {"include_contributions": True},
    },
)
result = resp.json()
print("Prediction:", result["predictions"][0]["prediction"])

Response 200 OK

{
  "predictions": [
    {"prediction": "0"}
  ],
  "contributions": [
    [
      {"feature": "amount", "value": -0.32},
      {"feature": "merchant_id", "value": -0.15},
      {"feature": "BiasTerm", "value": -1.20}
    ]
  ],
  "latency_ms": 45
}

Promote Deployment 

POST /api/mlops/deployments/{deployment_id}/promote

Promote a deployment to production. Any existing production deployment in the same project is automatically archived.

Request Body

Field	Type	Required	Description
`to_stage`	string	Yes	Must be `production`.

Example

curl -X POST "$BASE_URL/api/mlops/deployments/d0e1f2a3-b4c5-6789-0123-def456789012/promote" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"to_stage": "production"}'

requests.post(
    f"{BASE_URL}/api/mlops/deployments/d0e1f2a3-b4c5-6789-0123-def456789012/promote",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    json={"to_stage": "production"},
)

Response 200 OK

{
  "ok": true,
  "deployment_id": "d0e1f2a3-b4c5-6789-0123-def456789012",
  "stage": "production"
}

Rollback Deployment 

POST /api/mlops/deployments/{deployment_id}/rollback

Rollback production to a previous deployment or model. Provide either to_deployment_id (reactivate an archived deployment) or to_model_id (create a new production deployment for the given model).

Request Body

Field	Type	Required	Description
`to_deployment_id`	string	No*	UUID of the archived deployment to reactivate.
`to_model_id`	string	No*	UUID of the model to deploy as the new production version.

* At least one of to_deployment_id or to_model_id is required.

Example

curl -X POST "$BASE_URL/api/mlops/deployments/d0e1f2a3-b4c5-6789-0123-def456789012/rollback" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"to_model_id": "a7b8c9d0-e1f2-3456-7890-abcdef123456"}'

Response 200 OK

{
  "ok": true
}

Deactivate Deployment 

POST /api/mlops/deployments/{deployment_id}/deactivate

Deactivate a deployment without promoting a replacement. The deployment moves to archived stage.

Example

curl -X POST "$BASE_URL/api/mlops/deployments/d0e1f2a3-b4c5-6789-0123-def456789012/deactivate" \
  -H "Authorization: Bearer YOUR_API_KEY"

Response 200 OK

{
  "ok": true
}

Drift Metrics 

GET /api/mlops/deployments/{deployment_id}/drift

Return drift detection results comparing the training distribution to recent inference data. Returns per-feature PSI scores and overall drift status.

Example

curl "$BASE_URL/api/mlops/deployments/d0e1f2a3-b4c5-6789-0123-def456789012/drift" \
  -H "Authorization: Bearer YOUR_API_KEY"

resp = requests.get(
    f"{BASE_URL}/api/mlops/deployments/d0e1f2a3-b4c5-6789-0123-def456789012/drift",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
)
drift = resp.json()
for feature, info in drift.get("features", {}).items():
    print(f"{feature}: PSI={info['psi']:.4f} ({info['status']})")

Response 200 OK

{
  "deployment_id": "d0e1f2a3-b4c5-6789-0123-def456789012",
  "overall_status": "low",
  "features": {
    "amount": {"psi": 0.023, "status": "low"},
    "merchant_id": {"psi": 0.15, "status": "moderate"},
    "hour": {"psi": 0.008, "status": "low"}
  },
  "computed_at": "2026-02-28T12:00:00Z",
  "inference_rows": 5000
}

Run Drift Detection 

POST /api/mlops/deployments/{deployment_id}/drift/run

Enqueue a background job to compute drift metrics against recent inference data. The result can be retrieved via the drift metrics endpoint once the job completes.

Example

curl -X POST "$BASE_URL/api/mlops/deployments/d0e1f2a3-b4c5-6789-0123-def456789012/drift/run" \
  -H "Authorization: Bearer YOUR_API_KEY"

Response 200 OK

{
  "ok": true,
  "job_id": "e1f2a3b4-c5d6-7890-1234-ef5678901234"
}

Inference Logs 

GET /api/mlops/deployments/{deployment_id}/inference-logs

Return a paginated list of inference requests and responses for the deployment. Useful for monitoring, auditing, and debugging.

Query Parameters

Parameter	Type	Default	Description
`limit`	integer	50	Max items (1–500).
`offset`	integer	0	Pagination offset.
`since`	string	–	ISO 8601 timestamp. Only return logs after this time.
`until`	string	–	ISO 8601 timestamp. Only return logs before this time.

Example

curl "$BASE_URL/api/mlops/deployments/d0e1f2a3-b4c5-6789-0123-def456789012/inference-logs?limit=5" \
  -H "Authorization: Bearer YOUR_API_KEY"

Response 200 OK

{
  "items": [
    {
      "id": "f2a3b4c5-d6e7-8901-2345-f67890123456",
      "deployment_id": "d0e1f2a3-b4c5-6789-0123-def456789012",
      "inputs": {"amount": 29.99, "merchant_id": 42, "hour": 14},
      "prediction": "0",
      "latency_ms": 45,
      "status_code": 200,
      "created_at": "2026-02-28T14:30:00Z"
    }
  ],
  "total": 5000,
  "limit": 5,
  "offset": 0
}

Deployment Stages

Stage	Description
`staging`	Pre-production testing. Use this to validate model behavior before going live.
`production`	Live serving. Predictions from this deployment are production traffic.

Deployment Status Values

Status	Description
`active`	Deployment is running and serving predictions.
`inactive`	Deployment has been deactivated. Not serving predictions.
`failed`	Deployment failed to initialize or load the model.

Alert Metrics Reference

Metric	Description
`drift_psi`	Population Stability Index. Measures feature distribution drift between training and production data.
`accuracy_degradation`	Degradation in prediction accuracy over time.
`error_rate`	Percentage of predictions returning errors.
`latency_p99`	99th percentile prediction latency in milliseconds.
`model_staleness`	Days since the model was last retrained.
`prediction_anomaly`	Anomalous prediction patterns detected.

Alert Condition Operators

Operator	Symbol	Description
`gt`	>	Greater than threshold.
`lt`	<	Less than threshold.
`gte`	>=	Greater than or equal to threshold.
`lte`	<=	Less than or equal to threshold.
`eq`	==	Equal to threshold.
`neq`	!=	Not equal to threshold.

Alert Severity Levels

Severity	Description
`info`	Informational alert. No action required.
`warning`	Warning. Should be investigated.
`critical`	Critical issue requiring immediate attention.

Notification Channels

Channel	Description
`email`	Send alert to one or more email addresses.
`slack`	Post alert to a Slack channel via webhook.
`webhook`	Send alert payload to a custom HTTP endpoint.

Alert Rule Options

Option	Type	Description
`cooldown_minutes`	integer	Minimum minutes between repeated alerts (1–1440, default 60).
`escalation_minutes`	integer	Minutes before escalating severity (optional, 1–1440).
`escalation_severity`	string	Severity to escalate to: `warning` or `critical`.

Deployment Lifecycle Example 

The following shows the full lifecycle: create a staging deployment, test it, promote to production, and rollback if needed.

import requests

BASE_URL = "http://localhost:8888"
HEADERS = {"Authorization": "Bearer YOUR_API_KEY"}
PROJECT_ID = "d4e5f6a7-b8c9-0123-def4-567890123456"
MODEL_ID = "a7b8c9d0-e1f2-3456-7890-abcdef123456"

# 1. Deploy to staging
resp = requests.post(
    f"{BASE_URL}/api/mlops/projects/{PROJECT_ID}/deployments",
    headers=HEADERS,
    json={"model_id": MODEL_ID, "name": "v2-staging", "stage": "staging"},
)
deployment_id = resp.json()["deployment_id"]

# 2. Test with a sample prediction
resp = requests.post(
    f"{BASE_URL}/api/mlops/deployments/{deployment_id}/predict",
    headers=HEADERS,
    json={"inputs": {"amount": 50.0, "merchant_id": 10, "hour": 12}},
)
assert resp.status_code == 200
print("Staging test passed:", resp.json()["predictions"])

# 3. Promote to production
requests.post(
    f"{BASE_URL}/api/mlops/deployments/{deployment_id}/promote",
    headers=HEADERS,
    json={"to_stage": "production"},
)
print("Promoted to production")

# 4. Later: rollback if needed
# requests.post(
#     f"{BASE_URL}/api/mlops/deployments/{deployment_id}/rollback",
#     headers=HEADERS,
#     json={"to_model_id": OLD_MODEL_ID},
# )