MLOps & Model Serving

This guide covers model deployment, prediction serving, monitoring, alerting, and automatic retraining through the CorePlexML API.

Overview

CorePlexML MLOps provides:

  • Deployments – Serve any trained model as a REST prediction endpoint

  • Staging/Production – Promote models through deployment stages

  • Canary Rollouts – Gradual traffic shifting between model versions

  • Inference Logging – Track every prediction for auditing and drift analysis

  • Alerting – Configure rules for latency, error rate, and metric thresholds

  • Auto-Retraining – Trigger model retraining based on drift or schedules

  • Privacy Integration – Attach privacy policies to anonymize predictions


Creating a Deployment

Deploy a trained model to a prediction endpoint:

resp = requests.post(
    f"{BASE_URL}/api/mlops/projects/{project_id}/deployments",
    headers=HEADERS,
    json={
        "model_id": model_id,
        "name": "Churn Predictor v1",
        "stage": "staging",
        "traffic_percent": 100,
    },
)
deployment_id = resp.json()["id"]

Listing Deployments

resp = requests.get(
    f"{BASE_URL}/api/mlops/projects/{project_id}/deployments",
    headers=HEADERS,
)
for d in resp.json()["items"]:
    print(f"  {d['name']}: stage={d['stage']}, status={d['status']}")

Making Predictions

Single Prediction

resp = requests.post(
    f"{BASE_URL}/api/mlops/deployments/{deployment_id}/predict",
    headers=HEADERS,
    json={
        "inputs": {
            "tenure_months": 12,
            "monthly_charges": 65.0,
            "contract": "Month-to-month",
        },
    },
)
pred = resp.json()
print(f"Prediction: {pred['prediction']}")
print(f"Probabilities: {pred.get('probabilities', {})}")

Batch Predictions

Pass a list of input rows:

resp = requests.post(
    f"{BASE_URL}/api/mlops/deployments/{deployment_id}/predict",
    headers=HEADERS,
    json={
        "inputs": [
            {"tenure_months": 12, "monthly_charges": 65.0, "contract": "Month-to-month"},
            {"tenure_months": 48, "monthly_charges": 95.0, "contract": "Two year"},
            {"tenure_months": 2, "monthly_charges": 25.0, "contract": "One year"},
        ],
    },
)
for row in resp.json()["predictions"]:
    print(f"  -> {row['prediction']}")

Feature Contributions (SHAP)

Request SHAP contributions with predictions:

resp = requests.post(
    f"{BASE_URL}/api/mlops/deployments/{deployment_id}/predict",
    headers=HEADERS,
    json={
        "inputs": {"tenure_months": 3, "monthly_charges": 80.0},
        "options": {"contributions": True},
    },
)
for feat, val in resp.json().get("contributions", {}).items():
    print(f"  {feat}: {val:+.4f}")

Note

Contributions are not available for StackedEnsemble models.


Promoting and Rolling Back

Promote to Production

requests.post(
    f"{BASE_URL}/api/mlops/deployments/{deployment_id}/promote",
    headers=HEADERS,
    json={"to_stage": "production"},
)

Rollback

Roll back to a previous model version:

requests.post(
    f"{BASE_URL}/api/mlops/deployments/{deployment_id}/rollback",
    headers=HEADERS,
    json={"to_model_id": previous_model_id},
)

Deactivate

Take a deployment offline:

requests.post(
    f"{BASE_URL}/api/mlops/deployments/{deployment_id}/deactivate",
    headers=HEADERS,
)

Inference Logging

Every prediction is logged automatically. Query the logs:

resp = requests.get(
    f"{BASE_URL}/api/mlops/deployments/{deployment_id}/inference-logs",
    headers=HEADERS,
    params={"limit": 50, "offset": 0},
)
for log in resp.json()["items"]:
    print(f"  {log['timestamp']}: {log['prediction']} (latency: {log['latency_ms']}ms)")

Data Drift Detection

Run drift analysis comparing current predictions against the training data distribution:

# Trigger drift analysis
resp = requests.post(
    f"{BASE_URL}/api/mlops/deployments/{deployment_id}/drift/run",
    headers=HEADERS,
)

# Retrieve drift metrics
resp = requests.get(
    f"{BASE_URL}/api/mlops/deployments/{deployment_id}/drift",
    headers=HEADERS,
)
drift = resp.json()
print(f"Drift score: {drift.get('overall_score', 'N/A')}")
for col, metrics in drift.get("columns", {}).items():
    print(f"  {col}: PSI={metrics.get('psi', 'N/A'):.4f}")

Alerting

Configure alerts that trigger when metrics cross thresholds:

resp = requests.post(
    f"{BASE_URL}/api/mlops/projects/{project_id}/alerts",
    headers=HEADERS,
    json={
        "deployment_id": deployment_id,
        "name": "High Error Rate",
        "metric": "error_rate",
        "condition": "gt",
        "threshold": 0.05,
        "channel": "email",
        "recipients": ["ml-team@company.com"],
    },
)

Available metrics: error_rate, latency_p95, latency_p99, drift_score, prediction_volume.


Auto-Retraining

Configure automatic retraining triggers:

resp = requests.post(
    f"{BASE_URL}/api/mlops/projects/{project_id}/retraining",
    headers=HEADERS,
    json={
        "deployment_id": deployment_id,
        "trigger": "drift",
        "config": {
            "drift_threshold": 0.15,
            "check_interval_hours": 24,
            "auto_deploy": False,
        },
    },
)

Trigger types:

  • drift – Retrain when drift exceeds a threshold

  • schedule – Retrain on a cron-like schedule

  • manual – Retrain only when explicitly triggered


Deployment with Privacy

Attach a privacy policy to a deployment to automatically anonymize inference logs and optionally sanitize prediction responses:

resp = requests.post(
    f"{BASE_URL}/api/mlops/projects/{project_id}/deployments",
    headers=HEADERS,
    json={
        "model_id": model_id,
        "name": "HIPAA-Compliant Predictor",
        "stage": "production",
        "privacy_policy_id": policy_id,
        "privacy_anonymize_logs": True,
        "privacy_anonymize_response": False,
    },
)

SDK Example

from coreplexml import CorePlexMLClient

client = CorePlexMLClient(base_url=BASE_URL, api_key=API_KEY)

# Deploy
dep = client.deployments.create(
    project_id=project_id,
    model_id=model_id,
    name="Churn Predictor",
    stage="staging",
)

# Predict
pred = client.deployments.predict(dep["id"], inputs={
    "tenure_months": 12,
    "monthly_charges": 65.0,
})

# Promote
client.deployments.promote(dep["id"], to_stage="production")

See also