======================
MLOps & Model Serving
======================

This guide covers model deployment, prediction serving, monitoring,
alerting, and automatic retraining through the CorePlexML API.

.. contents:: Sections
   :local:
   :depth: 2

Overview
--------

CorePlexML MLOps provides:

- **Deployments** -- Serve any trained model as a REST prediction endpoint
- **Staging/Production** -- Promote models through deployment stages
- **Canary Rollouts** -- Gradual traffic shifting between model versions
- **Inference Logging** -- Track every prediction for auditing and drift analysis
- **Alerting** -- Configure rules for latency, error rate, and metric thresholds
- **Auto-Retraining** -- Trigger model retraining based on drift or schedules
- **Privacy Integration** -- Attach privacy policies to anonymize predictions

----

Creating a Deployment
---------------------

Deploy a trained model to a prediction endpoint:

.. code-block:: python

   resp = requests.post(
       f"{BASE_URL}/api/mlops/projects/{project_id}/deployments",
       headers=HEADERS,
       json={
           "model_id": model_id,
           "name": "Churn Predictor v1",
           "stage": "staging",
           "traffic_percent": 100,
       },
   )
   deployment_id = resp.json()["id"]

Listing Deployments
^^^^^^^^^^^^^^^^^^^

.. code-block:: python

   resp = requests.get(
       f"{BASE_URL}/api/mlops/projects/{project_id}/deployments",
       headers=HEADERS,
   )
   for d in resp.json()["items"]:
       print(f"  {d['name']}: stage={d['stage']}, status={d['status']}")

Making Predictions
------------------

Single Prediction
^^^^^^^^^^^^^^^^^

.. code-block:: python

   resp = requests.post(
       f"{BASE_URL}/api/mlops/deployments/{deployment_id}/predict",
       headers=HEADERS,
       json={
           "inputs": {
               "tenure_months": 12,
               "monthly_charges": 65.0,
               "contract": "Month-to-month",
           },
       },
   )
   pred = resp.json()
   print(f"Prediction: {pred['prediction']}")
   print(f"Probabilities: {pred.get('probabilities', {})}")

Batch Predictions
^^^^^^^^^^^^^^^^^

Pass a list of input rows:

.. code-block:: python

   resp = requests.post(
       f"{BASE_URL}/api/mlops/deployments/{deployment_id}/predict",
       headers=HEADERS,
       json={
           "inputs": [
               {"tenure_months": 12, "monthly_charges": 65.0, "contract": "Month-to-month"},
               {"tenure_months": 48, "monthly_charges": 95.0, "contract": "Two year"},
               {"tenure_months": 2, "monthly_charges": 25.0, "contract": "One year"},
           ],
       },
   )
   for row in resp.json()["predictions"]:
       print(f"  -> {row['prediction']}")

Feature Contributions (SHAP)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Request SHAP contributions with predictions:

.. code-block:: python

   resp = requests.post(
       f"{BASE_URL}/api/mlops/deployments/{deployment_id}/predict",
       headers=HEADERS,
       json={
           "inputs": {"tenure_months": 3, "monthly_charges": 80.0},
           "options": {"contributions": True},
       },
   )
   for feat, val in resp.json().get("contributions", {}).items():
       print(f"  {feat}: {val:+.4f}")

.. note::

   Contributions are not available for StackedEnsemble models.

----

Promoting and Rolling Back
--------------------------

Promote to Production
^^^^^^^^^^^^^^^^^^^^^

.. code-block:: python

   requests.post(
       f"{BASE_URL}/api/mlops/deployments/{deployment_id}/promote",
       headers=HEADERS,
       json={"to_stage": "production"},
   )

Rollback
^^^^^^^^

Roll back to a previous model version:

.. code-block:: python

   requests.post(
       f"{BASE_URL}/api/mlops/deployments/{deployment_id}/rollback",
       headers=HEADERS,
       json={"to_model_id": previous_model_id},
   )

Deactivate
^^^^^^^^^^

Take a deployment offline:

.. code-block:: python

   requests.post(
       f"{BASE_URL}/api/mlops/deployments/{deployment_id}/deactivate",
       headers=HEADERS,
   )

----

Inference Logging
-----------------

Every prediction is logged automatically. Query the logs:

.. code-block:: python

   resp = requests.get(
       f"{BASE_URL}/api/mlops/deployments/{deployment_id}/inference-logs",
       headers=HEADERS,
       params={"limit": 50, "offset": 0},
   )
   for log in resp.json()["items"]:
       print(f"  {log['timestamp']}: {log['prediction']} (latency: {log['latency_ms']}ms)")

----

Data Drift Detection
--------------------

Run drift analysis comparing current predictions against the training
data distribution:

.. code-block:: python

   # Trigger drift analysis
   resp = requests.post(
       f"{BASE_URL}/api/mlops/deployments/{deployment_id}/drift/run",
       headers=HEADERS,
   )

   # Retrieve drift metrics
   resp = requests.get(
       f"{BASE_URL}/api/mlops/deployments/{deployment_id}/drift",
       headers=HEADERS,
   )
   drift = resp.json()
   print(f"Drift score: {drift.get('overall_score', 'N/A')}")
   for col, metrics in drift.get("columns", {}).items():
       print(f"  {col}: PSI={metrics.get('psi', 'N/A'):.4f}")

----

Alerting
--------

Configure alerts that trigger when metrics cross thresholds:

.. code-block:: python

   resp = requests.post(
       f"{BASE_URL}/api/mlops/projects/{project_id}/alerts",
       headers=HEADERS,
       json={
           "deployment_id": deployment_id,
           "name": "High Error Rate",
           "metric": "error_rate",
           "condition": "gt",
           "threshold": 0.05,
           "channel": "email",
           "recipients": ["ml-team@company.com"],
       },
   )

Available metrics: ``error_rate``, ``latency_p95``, ``latency_p99``,
``drift_score``, ``prediction_volume``.

----

Auto-Retraining
---------------

Configure automatic retraining triggers:

.. code-block:: python

   resp = requests.post(
       f"{BASE_URL}/api/mlops/projects/{project_id}/retraining",
       headers=HEADERS,
       json={
           "deployment_id": deployment_id,
           "trigger": "drift",
           "config": {
               "drift_threshold": 0.15,
               "check_interval_hours": 24,
               "auto_deploy": False,
           },
       },
   )

Trigger types:

- ``drift`` -- Retrain when drift exceeds a threshold
- ``schedule`` -- Retrain on a cron-like schedule
- ``manual`` -- Retrain only when explicitly triggered

----

Deployment with Privacy
-----------------------

Attach a privacy policy to a deployment to automatically anonymize
inference logs and optionally sanitize prediction responses:

.. code-block:: python

   resp = requests.post(
       f"{BASE_URL}/api/mlops/projects/{project_id}/deployments",
       headers=HEADERS,
       json={
           "model_id": model_id,
           "name": "HIPAA-Compliant Predictor",
           "stage": "production",
           "privacy_policy_id": policy_id,
           "privacy_anonymize_logs": True,
           "privacy_anonymize_response": False,
       },
   )

----

SDK Example
-----------

.. code-block:: python

   from coreplexml import CorePlexMLClient

   client = CorePlexMLClient(base_url=BASE_URL, api_key=API_KEY)

   # Deploy
   dep = client.deployments.create(
       project_id=project_id,
       model_id=model_id,
       name="Churn Predictor",
       stage="staging",
   )

   # Predict
   pred = client.deployments.predict(dep["id"], inputs={
       "tenure_months": 12,
       "monthly_charges": 65.0,
   })

   # Promote
   client.deployments.promote(dep["id"], to_stage="production")

----

.. seealso::

   - :doc:`/api-reference/deployments` -- Full MLOps API reference
   - :doc:`what-if-analysis` -- Scenario-based model exploration