Deployments API

Deployments serve trained models as real-time prediction endpoints. The MLOps lifecycle supports staging and production stages, promotion, rollback, inference logging, and data drift detection.

All deployment endpoints are prefixed with /api/mlops.


Create Deployment

POST /api/mlops/projects/{project_id}/deployments

Deploy a model to either staging or production. If deploying to production, any existing production deployment in the same project is automatically archived.

Path Parameters

Parameter

Type

Description

project_id

UUID

Project that owns the model.

Request Body

Field

Type

Required

Description

model_id

string

Yes

UUID of the model to deploy. Must belong to the same project.

name

string

Yes

Deployment name.

stage

string

Yes

staging or production.

traffic_percent

integer

No

Traffic allocation percentage (1–100, default 100).

privacy_policy_id

string

No

Attach a privacy policy for runtime anonymization.

privacy_anonymize_logs

boolean

No

Anonymize PII in inference logs.

privacy_anonymize_response

boolean

No

Anonymize PII in prediction responses.

privacy_column_map

object

No

Map input column names to PII types.

privacy_threshold

float

No

Confidence threshold for PII detection (0.0–1.0).

Example

curl -X POST "$BASE_URL/api/mlops/projects/d4e5f6a7-b8c9-0123-def4-567890123456/deployments" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model_id": "a7b8c9d0-e1f2-3456-7890-abcdef123456",
    "name": "fraud-detector-staging",
    "stage": "staging",
    "traffic_percent": 100
  }'
import requests

resp = requests.post(
    f"{BASE_URL}/api/mlops/projects/d4e5f6a7-b8c9-0123-def4-567890123456/deployments",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    json={
        "model_id": "a7b8c9d0-e1f2-3456-7890-abcdef123456",
        "name": "fraud-detector-staging",
        "stage": "staging",
        "traffic_percent": 100,
    },
)
deployment_id = resp.json()["deployment_id"]
print("Deployed:", deployment_id)

Response 201 Created

{
  "id": "d0e1f2a3-b4c5-6789-0123-def456789012",
  "deployment_id": "d0e1f2a3-b4c5-6789-0123-def456789012",
  "stage": "staging",
  "is_active": true
}

List Deployments

GET /api/mlops/projects/{project_id}/deployments

Return all deployments for a project.

Query Parameters

Parameter

Type

Default

Description

limit

integer

50

Max items (1–200).

offset

integer

0

Pagination offset.

Example

curl "$BASE_URL/api/mlops/projects/d4e5f6a7-b8c9-0123-def4-567890123456/deployments" \
  -H "Authorization: Bearer YOUR_API_KEY"
resp = requests.get(
    f"{BASE_URL}/api/mlops/projects/d4e5f6a7-b8c9-0123-def4-567890123456/deployments",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
)
for dep in resp.json()["items"]:
    print(f"{dep['id']}  stage={dep['stage']}  active={dep['is_active']}")

Response 200 OK

{
  "items": [
    {
      "id": "d0e1f2a3-b4c5-6789-0123-def456789012",
      "model_id": "a7b8c9d0-e1f2-3456-7890-abcdef123456",
      "stage": "staging",
      "is_active": true,
      "config": {"name": "fraud-detector-staging", "traffic_percent": 100},
      "created_at": "2026-02-25T09:00:00Z"
    }
  ],
  "total": 1,
  "limit": 50,
  "offset": 0
}

Get Deployment Detail

GET /api/mlops/deployments/{deployment_id}

Return deployment metadata and the associated model details.

Example

curl "$BASE_URL/api/mlops/deployments/d0e1f2a3-b4c5-6789-0123-def456789012" \
  -H "Authorization: Bearer YOUR_API_KEY"

Response 200 OK

{
  "deployment": {
    "id": "d0e1f2a3-b4c5-6789-0123-def456789012",
    "project_id": "d4e5f6a7-b8c9-0123-def4-567890123456",
    "model_id": "a7b8c9d0-e1f2-3456-7890-abcdef123456",
    "stage": "staging",
    "is_active": true,
    "config": {"name": "fraud-detector-staging", "traffic_percent": 100},
    "created_at": "2026-02-25T09:00:00Z"
  },
  "model": {
    "id": "a7b8c9d0-e1f2-3456-7890-abcdef123456",
    "name": "GBM_1_AutoML",
    "algo": "GBM"
  }
}

Predict via Deployment

POST /api/mlops/deployments/{deployment_id}/predict

Send input features and receive predictions from the deployed model. This is the primary real-time prediction endpoint. Inference requests are logged automatically for monitoring and drift detection.

Request Body

Field

Type

Required

Description

inputs

dict or list[dict]

Yes

Feature values. Single record or batch.

options

object

No

Options (see table below).

Predict Options

Key

Type

Description

include_contributions

boolean

Include SHAP-style feature contributions (not available for StackedEnsemble models). Default: false.

store_payload

boolean

Store input payload in inference logs. Default: true.

privacy_policy_id

string

Override the deployment-level privacy policy for this request.

anonymize_logs

boolean

Override log anonymization for this request.

anonymize_response

boolean

Override response anonymization for this request.

Example

curl -X POST "$BASE_URL/api/mlops/deployments/d0e1f2a3-b4c5-6789-0123-def456789012/predict" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "inputs": {"amount": 29.99, "merchant_id": 42, "hour": 14},
    "options": {"include_contributions": true}
  }'
resp = requests.post(
    f"{BASE_URL}/api/mlops/deployments/d0e1f2a3-b4c5-6789-0123-def456789012/predict",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    json={
        "inputs": {"amount": 29.99, "merchant_id": 42, "hour": 14},
        "options": {"include_contributions": True},
    },
)
result = resp.json()
print("Prediction:", result["predictions"][0]["prediction"])

Response 200 OK

{
  "predictions": [
    {"prediction": "0"}
  ],
  "contributions": [
    [
      {"feature": "amount", "value": -0.32},
      {"feature": "merchant_id", "value": -0.15},
      {"feature": "BiasTerm", "value": -1.20}
    ]
  ],
  "latency_ms": 45
}

Promote Deployment

POST /api/mlops/deployments/{deployment_id}/promote

Promote a deployment to production. Any existing production deployment in the same project is automatically archived.

Request Body

Field

Type

Required

Description

to_stage

string

Yes

Must be production.

Example

curl -X POST "$BASE_URL/api/mlops/deployments/d0e1f2a3-b4c5-6789-0123-def456789012/promote" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"to_stage": "production"}'
requests.post(
    f"{BASE_URL}/api/mlops/deployments/d0e1f2a3-b4c5-6789-0123-def456789012/promote",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    json={"to_stage": "production"},
)

Response 200 OK

{
  "ok": true,
  "deployment_id": "d0e1f2a3-b4c5-6789-0123-def456789012",
  "stage": "production"
}

Rollback Deployment

POST /api/mlops/deployments/{deployment_id}/rollback

Rollback production to a previous deployment or model. Provide either to_deployment_id (reactivate an archived deployment) or to_model_id (create a new production deployment for the given model).

Request Body

Field

Type

Required

Description

to_deployment_id

string

No*

UUID of the archived deployment to reactivate.

to_model_id

string

No*

UUID of the model to deploy as the new production version.

* At least one of to_deployment_id or to_model_id is required.

Example

curl -X POST "$BASE_URL/api/mlops/deployments/d0e1f2a3-b4c5-6789-0123-def456789012/rollback" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"to_model_id": "a7b8c9d0-e1f2-3456-7890-abcdef123456"}'

Response 200 OK

{
  "ok": true
}

Deactivate Deployment

POST /api/mlops/deployments/{deployment_id}/deactivate

Deactivate a deployment without promoting a replacement. The deployment moves to archived stage.

Example

curl -X POST "$BASE_URL/api/mlops/deployments/d0e1f2a3-b4c5-6789-0123-def456789012/deactivate" \
  -H "Authorization: Bearer YOUR_API_KEY"

Response 200 OK

{
  "ok": true
}

Drift Metrics

GET /api/mlops/deployments/{deployment_id}/drift

Return drift detection results comparing the training distribution to recent inference data. Returns per-feature PSI scores and overall drift status.

Example

curl "$BASE_URL/api/mlops/deployments/d0e1f2a3-b4c5-6789-0123-def456789012/drift" \
  -H "Authorization: Bearer YOUR_API_KEY"
resp = requests.get(
    f"{BASE_URL}/api/mlops/deployments/d0e1f2a3-b4c5-6789-0123-def456789012/drift",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
)
drift = resp.json()
for feature, info in drift.get("features", {}).items():
    print(f"{feature}: PSI={info['psi']:.4f} ({info['status']})")

Response 200 OK

{
  "deployment_id": "d0e1f2a3-b4c5-6789-0123-def456789012",
  "overall_status": "low",
  "features": {
    "amount": {"psi": 0.023, "status": "low"},
    "merchant_id": {"psi": 0.15, "status": "moderate"},
    "hour": {"psi": 0.008, "status": "low"}
  },
  "computed_at": "2026-02-28T12:00:00Z",
  "inference_rows": 5000
}

Run Drift Detection

POST /api/mlops/deployments/{deployment_id}/drift/run

Enqueue a background job to compute drift metrics against recent inference data. The result can be retrieved via the drift metrics endpoint once the job completes.

Example

curl -X POST "$BASE_URL/api/mlops/deployments/d0e1f2a3-b4c5-6789-0123-def456789012/drift/run" \
  -H "Authorization: Bearer YOUR_API_KEY"

Response 200 OK

{
  "ok": true,
  "job_id": "e1f2a3b4-c5d6-7890-1234-ef5678901234"
}

Inference Logs

GET /api/mlops/deployments/{deployment_id}/inference-logs

Return a paginated list of inference requests and responses for the deployment. Useful for monitoring, auditing, and debugging.

Query Parameters

Parameter

Type

Default

Description

limit

integer

50

Max items (1–500).

offset

integer

0

Pagination offset.

since

string

ISO 8601 timestamp. Only return logs after this time.

until

string

ISO 8601 timestamp. Only return logs before this time.

Example

curl "$BASE_URL/api/mlops/deployments/d0e1f2a3-b4c5-6789-0123-def456789012/inference-logs?limit=5" \
  -H "Authorization: Bearer YOUR_API_KEY"

Response 200 OK

{
  "items": [
    {
      "id": "f2a3b4c5-d6e7-8901-2345-f67890123456",
      "deployment_id": "d0e1f2a3-b4c5-6789-0123-def456789012",
      "inputs": {"amount": 29.99, "merchant_id": 42, "hour": 14},
      "prediction": "0",
      "latency_ms": 45,
      "status_code": 200,
      "created_at": "2026-02-28T14:30:00Z"
    }
  ],
  "total": 5000,
  "limit": 5,
  "offset": 0
}

Deployment Stages

Stage

Description

staging

Pre-production testing. Use this to validate model behavior before going live.

production

Live serving. Predictions from this deployment are production traffic.

Deployment Status Values

Status

Description

active

Deployment is running and serving predictions.

inactive

Deployment has been deactivated. Not serving predictions.

failed

Deployment failed to initialize or load the model.


Alert Metrics Reference

Metric

Description

drift_psi

Population Stability Index. Measures feature distribution drift between training and production data.

accuracy_degradation

Degradation in prediction accuracy over time.

error_rate

Percentage of predictions returning errors.

latency_p99

99th percentile prediction latency in milliseconds.

model_staleness

Days since the model was last retrained.

prediction_anomaly

Anomalous prediction patterns detected.

Alert Condition Operators

Operator

Symbol

Description

gt

>

Greater than threshold.

lt

<

Less than threshold.

gte

>=

Greater than or equal to threshold.

lte

<=

Less than or equal to threshold.

eq

==

Equal to threshold.

neq

!=

Not equal to threshold.

Alert Severity Levels

Severity

Description

info

Informational alert. No action required.

warning

Warning. Should be investigated.

critical

Critical issue requiring immediate attention.

Notification Channels

Channel

Description

email

Send alert to one or more email addresses.

slack

Post alert to a Slack channel via webhook.

webhook

Send alert payload to a custom HTTP endpoint.

Alert Rule Options

Option

Type

Description

cooldown_minutes

integer

Minimum minutes between repeated alerts (1–1440, default 60).

escalation_minutes

integer

Minutes before escalating severity (optional, 1–1440).

escalation_severity

string

Severity to escalate to: warning or critical.


Deployment Lifecycle Example

The following shows the full lifecycle: create a staging deployment, test it, promote to production, and rollback if needed.

import requests

BASE_URL = "http://localhost:8888"
HEADERS = {"Authorization": "Bearer YOUR_API_KEY"}
PROJECT_ID = "d4e5f6a7-b8c9-0123-def4-567890123456"
MODEL_ID = "a7b8c9d0-e1f2-3456-7890-abcdef123456"

# 1. Deploy to staging
resp = requests.post(
    f"{BASE_URL}/api/mlops/projects/{PROJECT_ID}/deployments",
    headers=HEADERS,
    json={"model_id": MODEL_ID, "name": "v2-staging", "stage": "staging"},
)
deployment_id = resp.json()["deployment_id"]

# 2. Test with a sample prediction
resp = requests.post(
    f"{BASE_URL}/api/mlops/deployments/{deployment_id}/predict",
    headers=HEADERS,
    json={"inputs": {"amount": 50.0, "merchant_id": 10, "hour": 12}},
)
assert resp.status_code == 200
print("Staging test passed:", resp.json()["predictions"])

# 3. Promote to production
requests.post(
    f"{BASE_URL}/api/mlops/deployments/{deployment_id}/promote",
    headers=HEADERS,
    json={"to_stage": "production"},
)
print("Promoted to production")

# 4. Later: rollback if needed
# requests.post(
#     f"{BASE_URL}/api/mlops/deployments/{deployment_id}/rollback",
#     headers=HEADERS,
#     json={"to_model_id": OLD_MODEL_ID},
# )

See also