====================== MLOps & Model Serving ====================== This guide covers model deployment, prediction serving, monitoring, alerting, and automatic retraining through the CorePlexML API. .. contents:: Sections :local: :depth: 2 Overview -------- CorePlexML MLOps provides: - **Deployments** -- Serve any trained model as a REST prediction endpoint - **Staging/Production** -- Promote models through deployment stages - **Canary Rollouts** -- Gradual traffic shifting between model versions - **Inference Logging** -- Track every prediction for auditing and drift analysis - **Alerting** -- Configure rules for latency, error rate, and metric thresholds - **Auto-Retraining** -- Trigger model retraining based on drift or schedules - **Privacy Integration** -- Attach privacy policies to anonymize predictions ---- Creating a Deployment --------------------- Deploy a trained model to a prediction endpoint: .. code-block:: python resp = requests.post( f"{BASE_URL}/api/mlops/projects/{project_id}/deployments", headers=HEADERS, json={ "model_id": model_id, "name": "Churn Predictor v1", "stage": "staging", "traffic_percent": 100, }, ) deployment_id = resp.json()["id"] Listing Deployments ^^^^^^^^^^^^^^^^^^^ .. code-block:: python resp = requests.get( f"{BASE_URL}/api/mlops/projects/{project_id}/deployments", headers=HEADERS, ) for d in resp.json()["items"]: print(f" {d['name']}: stage={d['stage']}, status={d['status']}") Making Predictions ------------------ Single Prediction ^^^^^^^^^^^^^^^^^ .. code-block:: python resp = requests.post( f"{BASE_URL}/api/mlops/deployments/{deployment_id}/predict", headers=HEADERS, json={ "inputs": { "tenure_months": 12, "monthly_charges": 65.0, "contract": "Month-to-month", }, }, ) pred = resp.json() print(f"Prediction: {pred['prediction']}") print(f"Probabilities: {pred.get('probabilities', {})}") Batch Predictions ^^^^^^^^^^^^^^^^^ Pass a list of input rows: .. code-block:: python resp = requests.post( f"{BASE_URL}/api/mlops/deployments/{deployment_id}/predict", headers=HEADERS, json={ "inputs": [ {"tenure_months": 12, "monthly_charges": 65.0, "contract": "Month-to-month"}, {"tenure_months": 48, "monthly_charges": 95.0, "contract": "Two year"}, {"tenure_months": 2, "monthly_charges": 25.0, "contract": "One year"}, ], }, ) for row in resp.json()["predictions"]: print(f" -> {row['prediction']}") Feature Contributions (SHAP) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Request SHAP contributions with predictions: .. code-block:: python resp = requests.post( f"{BASE_URL}/api/mlops/deployments/{deployment_id}/predict", headers=HEADERS, json={ "inputs": {"tenure_months": 3, "monthly_charges": 80.0}, "options": {"contributions": True}, }, ) for feat, val in resp.json().get("contributions", {}).items(): print(f" {feat}: {val:+.4f}") .. note:: Contributions are not available for StackedEnsemble models. ---- Promoting and Rolling Back -------------------------- Promote to Production ^^^^^^^^^^^^^^^^^^^^^ .. code-block:: python requests.post( f"{BASE_URL}/api/mlops/deployments/{deployment_id}/promote", headers=HEADERS, json={"to_stage": "production"}, ) Rollback ^^^^^^^^ Roll back to a previous model version: .. code-block:: python requests.post( f"{BASE_URL}/api/mlops/deployments/{deployment_id}/rollback", headers=HEADERS, json={"to_model_id": previous_model_id}, ) Deactivate ^^^^^^^^^^ Take a deployment offline: .. code-block:: python requests.post( f"{BASE_URL}/api/mlops/deployments/{deployment_id}/deactivate", headers=HEADERS, ) ---- Inference Logging ----------------- Every prediction is logged automatically. Query the logs: .. code-block:: python resp = requests.get( f"{BASE_URL}/api/mlops/deployments/{deployment_id}/inference-logs", headers=HEADERS, params={"limit": 50, "offset": 0}, ) for log in resp.json()["items"]: print(f" {log['timestamp']}: {log['prediction']} (latency: {log['latency_ms']}ms)") ---- Data Drift Detection -------------------- Run drift analysis comparing current predictions against the training data distribution: .. code-block:: python # Trigger drift analysis resp = requests.post( f"{BASE_URL}/api/mlops/deployments/{deployment_id}/drift/run", headers=HEADERS, ) # Retrieve drift metrics resp = requests.get( f"{BASE_URL}/api/mlops/deployments/{deployment_id}/drift", headers=HEADERS, ) drift = resp.json() print(f"Drift score: {drift.get('overall_score', 'N/A')}") for col, metrics in drift.get("columns", {}).items(): print(f" {col}: PSI={metrics.get('psi', 'N/A'):.4f}") ---- Alerting -------- Configure alerts that trigger when metrics cross thresholds: .. code-block:: python resp = requests.post( f"{BASE_URL}/api/mlops/projects/{project_id}/alerts", headers=HEADERS, json={ "deployment_id": deployment_id, "name": "High Error Rate", "metric": "error_rate", "condition": "gt", "threshold": 0.05, "channel": "email", "recipients": ["ml-team@company.com"], }, ) Available metrics: ``error_rate``, ``latency_p95``, ``latency_p99``, ``drift_score``, ``prediction_volume``. ---- Auto-Retraining --------------- Configure automatic retraining triggers: .. code-block:: python resp = requests.post( f"{BASE_URL}/api/mlops/projects/{project_id}/retraining", headers=HEADERS, json={ "deployment_id": deployment_id, "trigger": "drift", "config": { "drift_threshold": 0.15, "check_interval_hours": 24, "auto_deploy": False, }, }, ) Trigger types: - ``drift`` -- Retrain when drift exceeds a threshold - ``schedule`` -- Retrain on a cron-like schedule - ``manual`` -- Retrain only when explicitly triggered ---- Deployment with Privacy ----------------------- Attach a privacy policy to a deployment to automatically anonymize inference logs and optionally sanitize prediction responses: .. code-block:: python resp = requests.post( f"{BASE_URL}/api/mlops/projects/{project_id}/deployments", headers=HEADERS, json={ "model_id": model_id, "name": "HIPAA-Compliant Predictor", "stage": "production", "privacy_policy_id": policy_id, "privacy_anonymize_logs": True, "privacy_anonymize_response": False, }, ) ---- SDK Example ----------- .. code-block:: python from coreplexml import CorePlexMLClient client = CorePlexMLClient(base_url=BASE_URL, api_key=API_KEY) # Deploy dep = client.deployments.create( project_id=project_id, model_id=model_id, name="Churn Predictor", stage="staging", ) # Predict pred = client.deployments.predict(dep["id"], inputs={ "tenure_months": 12, "monthly_charges": 65.0, }) # Promote client.deployments.promote(dep["id"], to_stage="production") ---- .. seealso:: - :doc:`/api-reference/deployments` -- Full MLOps API reference - :doc:`what-if-analysis` -- Scenario-based model exploration