=============== Deployments API =============== Deployments serve trained models as real-time prediction endpoints. The MLOps lifecycle supports staging and production stages, promotion, rollback, inference logging, and data drift detection. All deployment endpoints are prefixed with ``/api/mlops``. .. contents:: Endpoints :local: :depth: 1 ---- Create Deployment ----------------- .. code-block:: text POST /api/mlops/projects/{project_id}/deployments Deploy a model to either staging or production. If deploying to ``production``, any existing production deployment in the same project is automatically archived. **Path Parameters** .. list-table:: :header-rows: 1 :widths: 20 10 70 * - Parameter - Type - Description * - ``project_id`` - UUID - Project that owns the model. **Request Body** .. list-table:: :header-rows: 1 :widths: 25 10 10 55 * - Field - Type - Required - Description * - ``model_id`` - string - Yes - UUID of the model to deploy. Must belong to the same project. * - ``name`` - string - Yes - Deployment name. * - ``stage`` - string - Yes - ``staging`` or ``production``. * - ``traffic_percent`` - integer - No - Traffic allocation percentage (1--100, default 100). * - ``privacy_policy_id`` - string - No - Attach a privacy policy for runtime anonymization. * - ``privacy_anonymize_logs`` - boolean - No - Anonymize PII in inference logs. * - ``privacy_anonymize_response`` - boolean - No - Anonymize PII in prediction responses. * - ``privacy_column_map`` - object - No - Map input column names to PII types. * - ``privacy_threshold`` - float - No - Confidence threshold for PII detection (0.0--1.0). **Example** .. code-block:: bash curl -X POST "$BASE_URL/api/mlops/projects/d4e5f6a7-b8c9-0123-def4-567890123456/deployments" \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model_id": "a7b8c9d0-e1f2-3456-7890-abcdef123456", "name": "fraud-detector-staging", "stage": "staging", "traffic_percent": 100 }' .. code-block:: python import requests resp = requests.post( f"{BASE_URL}/api/mlops/projects/d4e5f6a7-b8c9-0123-def4-567890123456/deployments", headers={"Authorization": "Bearer YOUR_API_KEY"}, json={ "model_id": "a7b8c9d0-e1f2-3456-7890-abcdef123456", "name": "fraud-detector-staging", "stage": "staging", "traffic_percent": 100, }, ) deployment_id = resp.json()["deployment_id"] print("Deployed:", deployment_id) **Response** ``201 Created`` .. code-block:: json { "id": "d0e1f2a3-b4c5-6789-0123-def456789012", "deployment_id": "d0e1f2a3-b4c5-6789-0123-def456789012", "stage": "staging", "is_active": true } ---- List Deployments ---------------- .. code-block:: text GET /api/mlops/projects/{project_id}/deployments Return all deployments for a project. **Query Parameters** .. list-table:: :header-rows: 1 :widths: 20 10 10 60 * - Parameter - Type - Default - Description * - ``limit`` - integer - 50 - Max items (1--200). * - ``offset`` - integer - 0 - Pagination offset. **Example** .. code-block:: bash curl "$BASE_URL/api/mlops/projects/d4e5f6a7-b8c9-0123-def4-567890123456/deployments" \ -H "Authorization: Bearer YOUR_API_KEY" .. code-block:: python resp = requests.get( f"{BASE_URL}/api/mlops/projects/d4e5f6a7-b8c9-0123-def4-567890123456/deployments", headers={"Authorization": "Bearer YOUR_API_KEY"}, ) for dep in resp.json()["items"]: print(f"{dep['id']} stage={dep['stage']} active={dep['is_active']}") **Response** ``200 OK`` .. code-block:: json { "items": [ { "id": "d0e1f2a3-b4c5-6789-0123-def456789012", "model_id": "a7b8c9d0-e1f2-3456-7890-abcdef123456", "stage": "staging", "is_active": true, "config": {"name": "fraud-detector-staging", "traffic_percent": 100}, "created_at": "2026-02-25T09:00:00Z" } ], "total": 1, "limit": 50, "offset": 0 } ---- Get Deployment Detail --------------------- .. code-block:: text GET /api/mlops/deployments/{deployment_id} Return deployment metadata and the associated model details. **Example** .. code-block:: bash curl "$BASE_URL/api/mlops/deployments/d0e1f2a3-b4c5-6789-0123-def456789012" \ -H "Authorization: Bearer YOUR_API_KEY" **Response** ``200 OK`` .. code-block:: json { "deployment": { "id": "d0e1f2a3-b4c5-6789-0123-def456789012", "project_id": "d4e5f6a7-b8c9-0123-def4-567890123456", "model_id": "a7b8c9d0-e1f2-3456-7890-abcdef123456", "stage": "staging", "is_active": true, "config": {"name": "fraud-detector-staging", "traffic_percent": 100}, "created_at": "2026-02-25T09:00:00Z" }, "model": { "id": "a7b8c9d0-e1f2-3456-7890-abcdef123456", "name": "GBM_1_AutoML", "algo": "GBM" } } ---- Predict via Deployment ---------------------- .. code-block:: text POST /api/mlops/deployments/{deployment_id}/predict Send input features and receive predictions from the deployed model. This is the primary real-time prediction endpoint. Inference requests are logged automatically for monitoring and drift detection. **Request Body** .. list-table:: :header-rows: 1 :widths: 20 10 10 60 * - Field - Type - Required - Description * - ``inputs`` - dict or list[dict] - Yes - Feature values. Single record or batch. * - ``options`` - object - No - Options (see table below). **Predict Options** .. list-table:: :header-rows: 1 :widths: 25 10 65 * - Key - Type - Description * - ``include_contributions`` - boolean - Include SHAP-style feature contributions (not available for StackedEnsemble models). Default: ``false``. * - ``store_payload`` - boolean - Store input payload in inference logs. Default: ``true``. * - ``privacy_policy_id`` - string - Override the deployment-level privacy policy for this request. * - ``anonymize_logs`` - boolean - Override log anonymization for this request. * - ``anonymize_response`` - boolean - Override response anonymization for this request. **Example** .. code-block:: bash curl -X POST "$BASE_URL/api/mlops/deployments/d0e1f2a3-b4c5-6789-0123-def456789012/predict" \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "inputs": {"amount": 29.99, "merchant_id": 42, "hour": 14}, "options": {"include_contributions": true} }' .. code-block:: python resp = requests.post( f"{BASE_URL}/api/mlops/deployments/d0e1f2a3-b4c5-6789-0123-def456789012/predict", headers={"Authorization": "Bearer YOUR_API_KEY"}, json={ "inputs": {"amount": 29.99, "merchant_id": 42, "hour": 14}, "options": {"include_contributions": True}, }, ) result = resp.json() print("Prediction:", result["predictions"][0]["prediction"]) **Response** ``200 OK`` .. code-block:: json { "predictions": [ {"prediction": "0"} ], "contributions": [ [ {"feature": "amount", "value": -0.32}, {"feature": "merchant_id", "value": -0.15}, {"feature": "BiasTerm", "value": -1.20} ] ], "latency_ms": 45 } ---- Promote Deployment ------------------ .. code-block:: text POST /api/mlops/deployments/{deployment_id}/promote Promote a deployment to production. Any existing production deployment in the same project is automatically archived. **Request Body** .. list-table:: :header-rows: 1 :widths: 20 10 10 60 * - Field - Type - Required - Description * - ``to_stage`` - string - Yes - Must be ``production``. **Example** .. code-block:: bash curl -X POST "$BASE_URL/api/mlops/deployments/d0e1f2a3-b4c5-6789-0123-def456789012/promote" \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{"to_stage": "production"}' .. code-block:: python requests.post( f"{BASE_URL}/api/mlops/deployments/d0e1f2a3-b4c5-6789-0123-def456789012/promote", headers={"Authorization": "Bearer YOUR_API_KEY"}, json={"to_stage": "production"}, ) **Response** ``200 OK`` .. code-block:: json { "ok": true, "deployment_id": "d0e1f2a3-b4c5-6789-0123-def456789012", "stage": "production" } ---- Rollback Deployment ------------------- .. code-block:: text POST /api/mlops/deployments/{deployment_id}/rollback Rollback production to a previous deployment or model. Provide either ``to_deployment_id`` (reactivate an archived deployment) or ``to_model_id`` (create a new production deployment for the given model). **Request Body** .. list-table:: :header-rows: 1 :widths: 25 10 10 55 * - Field - Type - Required - Description * - ``to_deployment_id`` - string - No* - UUID of the archived deployment to reactivate. * - ``to_model_id`` - string - No* - UUID of the model to deploy as the new production version. \* At least one of ``to_deployment_id`` or ``to_model_id`` is required. **Example** .. code-block:: bash curl -X POST "$BASE_URL/api/mlops/deployments/d0e1f2a3-b4c5-6789-0123-def456789012/rollback" \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{"to_model_id": "a7b8c9d0-e1f2-3456-7890-abcdef123456"}' **Response** ``200 OK`` .. code-block:: json { "ok": true } ---- Deactivate Deployment --------------------- .. code-block:: text POST /api/mlops/deployments/{deployment_id}/deactivate Deactivate a deployment without promoting a replacement. The deployment moves to ``archived`` stage. **Example** .. code-block:: bash curl -X POST "$BASE_URL/api/mlops/deployments/d0e1f2a3-b4c5-6789-0123-def456789012/deactivate" \ -H "Authorization: Bearer YOUR_API_KEY" **Response** ``200 OK`` .. code-block:: json { "ok": true } ---- Drift Metrics ------------- .. code-block:: text GET /api/mlops/deployments/{deployment_id}/drift Return drift detection results comparing the training distribution to recent inference data. Returns per-feature PSI scores and overall drift status. **Example** .. code-block:: bash curl "$BASE_URL/api/mlops/deployments/d0e1f2a3-b4c5-6789-0123-def456789012/drift" \ -H "Authorization: Bearer YOUR_API_KEY" .. code-block:: python resp = requests.get( f"{BASE_URL}/api/mlops/deployments/d0e1f2a3-b4c5-6789-0123-def456789012/drift", headers={"Authorization": "Bearer YOUR_API_KEY"}, ) drift = resp.json() for feature, info in drift.get("features", {}).items(): print(f"{feature}: PSI={info['psi']:.4f} ({info['status']})") **Response** ``200 OK`` .. code-block:: json { "deployment_id": "d0e1f2a3-b4c5-6789-0123-def456789012", "overall_status": "low", "features": { "amount": {"psi": 0.023, "status": "low"}, "merchant_id": {"psi": 0.15, "status": "moderate"}, "hour": {"psi": 0.008, "status": "low"} }, "computed_at": "2026-02-28T12:00:00Z", "inference_rows": 5000 } ---- Run Drift Detection ------------------- .. code-block:: text POST /api/mlops/deployments/{deployment_id}/drift/run Enqueue a background job to compute drift metrics against recent inference data. The result can be retrieved via the drift metrics endpoint once the job completes. **Example** .. code-block:: bash curl -X POST "$BASE_URL/api/mlops/deployments/d0e1f2a3-b4c5-6789-0123-def456789012/drift/run" \ -H "Authorization: Bearer YOUR_API_KEY" **Response** ``200 OK`` .. code-block:: json { "ok": true, "job_id": "e1f2a3b4-c5d6-7890-1234-ef5678901234" } ---- Inference Logs -------------- .. code-block:: text GET /api/mlops/deployments/{deployment_id}/inference-logs Return a paginated list of inference requests and responses for the deployment. Useful for monitoring, auditing, and debugging. **Query Parameters** .. list-table:: :header-rows: 1 :widths: 20 10 10 60 * - Parameter - Type - Default - Description * - ``limit`` - integer - 50 - Max items (1--500). * - ``offset`` - integer - 0 - Pagination offset. * - ``since`` - string - -- - ISO 8601 timestamp. Only return logs after this time. * - ``until`` - string - -- - ISO 8601 timestamp. Only return logs before this time. **Example** .. code-block:: bash curl "$BASE_URL/api/mlops/deployments/d0e1f2a3-b4c5-6789-0123-def456789012/inference-logs?limit=5" \ -H "Authorization: Bearer YOUR_API_KEY" **Response** ``200 OK`` .. code-block:: json { "items": [ { "id": "f2a3b4c5-d6e7-8901-2345-f67890123456", "deployment_id": "d0e1f2a3-b4c5-6789-0123-def456789012", "inputs": {"amount": 29.99, "merchant_id": 42, "hour": 14}, "prediction": "0", "latency_ms": 45, "status_code": 200, "created_at": "2026-02-28T14:30:00Z" } ], "total": 5000, "limit": 5, "offset": 0 } ---- Deployment Stages ^^^^^^^^^^^^^^^^^ .. list-table:: :header-rows: 1 :widths: 20 80 * - Stage - Description * - ``staging`` - Pre-production testing. Use this to validate model behavior before going live. * - ``production`` - Live serving. Predictions from this deployment are production traffic. Deployment Status Values ^^^^^^^^^^^^^^^^^^^^^^^^ .. list-table:: :header-rows: 1 :widths: 20 80 * - Status - Description * - ``active`` - Deployment is running and serving predictions. * - ``inactive`` - Deployment has been deactivated. Not serving predictions. * - ``failed`` - Deployment failed to initialize or load the model. ---- Alert Metrics Reference ^^^^^^^^^^^^^^^^^^^^^^^ .. list-table:: :header-rows: 1 :widths: 25 75 * - Metric - Description * - ``drift_psi`` - Population Stability Index. Measures feature distribution drift between training and production data. * - ``accuracy_degradation`` - Degradation in prediction accuracy over time. * - ``error_rate`` - Percentage of predictions returning errors. * - ``latency_p99`` - 99th percentile prediction latency in milliseconds. * - ``model_staleness`` - Days since the model was last retrained. * - ``prediction_anomaly`` - Anomalous prediction patterns detected. Alert Condition Operators ^^^^^^^^^^^^^^^^^^^^^^^^^ .. list-table:: :header-rows: 1 :widths: 15 20 65 * - Operator - Symbol - Description * - ``gt`` - > - Greater than threshold. * - ``lt`` - < - Less than threshold. * - ``gte`` - >= - Greater than or equal to threshold. * - ``lte`` - <= - Less than or equal to threshold. * - ``eq`` - == - Equal to threshold. * - ``neq`` - != - Not equal to threshold. Alert Severity Levels ^^^^^^^^^^^^^^^^^^^^^ .. list-table:: :header-rows: 1 :widths: 20 80 * - Severity - Description * - ``info`` - Informational alert. No action required. * - ``warning`` - Warning. Should be investigated. * - ``critical`` - Critical issue requiring immediate attention. Notification Channels ^^^^^^^^^^^^^^^^^^^^^ .. list-table:: :header-rows: 1 :widths: 20 80 * - Channel - Description * - ``email`` - Send alert to one or more email addresses. * - ``slack`` - Post alert to a Slack channel via webhook. * - ``webhook`` - Send alert payload to a custom HTTP endpoint. Alert Rule Options ^^^^^^^^^^^^^^^^^^ .. list-table:: :header-rows: 1 :widths: 25 15 60 * - Option - Type - Description * - ``cooldown_minutes`` - integer - Minimum minutes between repeated alerts (1--1440, default 60). * - ``escalation_minutes`` - integer - Minutes before escalating severity (optional, 1--1440). * - ``escalation_severity`` - string - Severity to escalate to: ``warning`` or ``critical``. ---- Deployment Lifecycle Example ---------------------------- The following shows the full lifecycle: create a staging deployment, test it, promote to production, and rollback if needed. .. code-block:: python import requests BASE_URL = "http://localhost:8888" HEADERS = {"Authorization": "Bearer YOUR_API_KEY"} PROJECT_ID = "d4e5f6a7-b8c9-0123-def4-567890123456" MODEL_ID = "a7b8c9d0-e1f2-3456-7890-abcdef123456" # 1. Deploy to staging resp = requests.post( f"{BASE_URL}/api/mlops/projects/{PROJECT_ID}/deployments", headers=HEADERS, json={"model_id": MODEL_ID, "name": "v2-staging", "stage": "staging"}, ) deployment_id = resp.json()["deployment_id"] # 2. Test with a sample prediction resp = requests.post( f"{BASE_URL}/api/mlops/deployments/{deployment_id}/predict", headers=HEADERS, json={"inputs": {"amount": 50.0, "merchant_id": 10, "hour": 12}}, ) assert resp.status_code == 200 print("Staging test passed:", resp.json()["predictions"]) # 3. Promote to production requests.post( f"{BASE_URL}/api/mlops/deployments/{deployment_id}/promote", headers=HEADERS, json={"to_stage": "production"}, ) print("Promoted to production") # 4. Later: rollback if needed # requests.post( # f"{BASE_URL}/api/mlops/deployments/{deployment_id}/rollback", # headers=HEADERS, # json={"to_model_id": OLD_MODEL_ID}, # ) ---- .. seealso:: - :doc:`models` -- Model details and direct prediction. - :doc:`studio` -- What-if analysis against a deployment. - :doc:`reports` -- Generate deployment reports.