MLOps & Model Serving
This guide covers model deployment, prediction serving, monitoring, alerting, and automatic retraining through the CorePlexML API.
Overview
CorePlexML MLOps provides:
Deployments – Serve any trained model as a REST prediction endpoint
Staging/Production – Promote models through deployment stages
Canary Rollouts – Gradual traffic shifting between model versions
Inference Logging – Track every prediction for auditing and drift analysis
Alerting – Configure rules for latency, error rate, and metric thresholds
Auto-Retraining – Trigger model retraining based on drift or schedules
Privacy Integration – Attach privacy policies to anonymize predictions
Creating a Deployment
Deploy a trained model to a prediction endpoint:
resp = requests.post(
f"{BASE_URL}/api/mlops/projects/{project_id}/deployments",
headers=HEADERS,
json={
"model_id": model_id,
"name": "Churn Predictor v1",
"stage": "staging",
"traffic_percent": 100,
},
)
deployment_id = resp.json()["id"]
Listing Deployments
resp = requests.get(
f"{BASE_URL}/api/mlops/projects/{project_id}/deployments",
headers=HEADERS,
)
for d in resp.json()["items"]:
print(f" {d['name']}: stage={d['stage']}, status={d['status']}")
Making Predictions
Single Prediction
resp = requests.post(
f"{BASE_URL}/api/mlops/deployments/{deployment_id}/predict",
headers=HEADERS,
json={
"inputs": {
"tenure_months": 12,
"monthly_charges": 65.0,
"contract": "Month-to-month",
},
},
)
pred = resp.json()
print(f"Prediction: {pred['prediction']}")
print(f"Probabilities: {pred.get('probabilities', {})}")
Batch Predictions
Pass a list of input rows:
resp = requests.post(
f"{BASE_URL}/api/mlops/deployments/{deployment_id}/predict",
headers=HEADERS,
json={
"inputs": [
{"tenure_months": 12, "monthly_charges": 65.0, "contract": "Month-to-month"},
{"tenure_months": 48, "monthly_charges": 95.0, "contract": "Two year"},
{"tenure_months": 2, "monthly_charges": 25.0, "contract": "One year"},
],
},
)
for row in resp.json()["predictions"]:
print(f" -> {row['prediction']}")
Feature Contributions (SHAP)
Request SHAP contributions with predictions:
resp = requests.post(
f"{BASE_URL}/api/mlops/deployments/{deployment_id}/predict",
headers=HEADERS,
json={
"inputs": {"tenure_months": 3, "monthly_charges": 80.0},
"options": {"contributions": True},
},
)
for feat, val in resp.json().get("contributions", {}).items():
print(f" {feat}: {val:+.4f}")
Note
Contributions are not available for StackedEnsemble models.
Promoting and Rolling Back
Promote to Production
requests.post(
f"{BASE_URL}/api/mlops/deployments/{deployment_id}/promote",
headers=HEADERS,
json={"to_stage": "production"},
)
Rollback
Roll back to a previous model version:
requests.post(
f"{BASE_URL}/api/mlops/deployments/{deployment_id}/rollback",
headers=HEADERS,
json={"to_model_id": previous_model_id},
)
Deactivate
Take a deployment offline:
requests.post(
f"{BASE_URL}/api/mlops/deployments/{deployment_id}/deactivate",
headers=HEADERS,
)
Inference Logging
Every prediction is logged automatically. Query the logs:
resp = requests.get(
f"{BASE_URL}/api/mlops/deployments/{deployment_id}/inference-logs",
headers=HEADERS,
params={"limit": 50, "offset": 0},
)
for log in resp.json()["items"]:
print(f" {log['timestamp']}: {log['prediction']} (latency: {log['latency_ms']}ms)")
Data Drift Detection
Run drift analysis comparing current predictions against the training data distribution:
# Trigger drift analysis
resp = requests.post(
f"{BASE_URL}/api/mlops/deployments/{deployment_id}/drift/run",
headers=HEADERS,
)
# Retrieve drift metrics
resp = requests.get(
f"{BASE_URL}/api/mlops/deployments/{deployment_id}/drift",
headers=HEADERS,
)
drift = resp.json()
print(f"Drift score: {drift.get('overall_score', 'N/A')}")
for col, metrics in drift.get("columns", {}).items():
print(f" {col}: PSI={metrics.get('psi', 'N/A'):.4f}")
Alerting
Configure alerts that trigger when metrics cross thresholds:
resp = requests.post(
f"{BASE_URL}/api/mlops/projects/{project_id}/alerts",
headers=HEADERS,
json={
"deployment_id": deployment_id,
"name": "High Error Rate",
"metric": "error_rate",
"condition": "gt",
"threshold": 0.05,
"channel": "email",
"recipients": ["ml-team@company.com"],
},
)
Available metrics: error_rate, latency_p95, latency_p99,
drift_score, prediction_volume.
Auto-Retraining
Configure automatic retraining triggers:
resp = requests.post(
f"{BASE_URL}/api/mlops/projects/{project_id}/retraining",
headers=HEADERS,
json={
"deployment_id": deployment_id,
"trigger": "drift",
"config": {
"drift_threshold": 0.15,
"check_interval_hours": 24,
"auto_deploy": False,
},
},
)
Trigger types:
drift– Retrain when drift exceeds a thresholdschedule– Retrain on a cron-like schedulemanual– Retrain only when explicitly triggered
Deployment with Privacy
Attach a privacy policy to a deployment to automatically anonymize inference logs and optionally sanitize prediction responses:
resp = requests.post(
f"{BASE_URL}/api/mlops/projects/{project_id}/deployments",
headers=HEADERS,
json={
"model_id": model_id,
"name": "HIPAA-Compliant Predictor",
"stage": "production",
"privacy_policy_id": policy_id,
"privacy_anonymize_logs": True,
"privacy_anonymize_response": False,
},
)
SDK Example
from coreplexml import CorePlexMLClient
client = CorePlexMLClient(base_url=BASE_URL, api_key=API_KEY)
# Deploy
dep = client.deployments.create(
project_id=project_id,
model_id=model_id,
name="Churn Predictor",
stage="staging",
)
# Predict
pred = client.deployments.predict(dep["id"], inputs={
"tenure_months": 12,
"monthly_charges": 65.0,
})
# Promote
client.deployments.promote(dep["id"], to_stage="production")
See also
Deployments API – Full MLOps API reference
What-If Analysis – Scenario-based model exploration