===========
Quick Start
===========

This guide walks through a complete ML pipeline using the CorePlexML
REST API. You will create a project, upload data, train models, deploy
the best one, and make predictions -- all via HTTP calls.

.. contents:: Steps
   :local:
   :depth: 1

Prerequisites
-------------

- A running CorePlexML instance (e.g., ``http://localhost:8888``)
- An API key (see :doc:`authentication`)
- A CSV dataset file

Set up shell variables used throughout this guide:

.. code-block:: bash

   export BASE_URL="http://localhost:8888"
   export API_KEY="cpx_live.your-api-key-here"

----

1. Create a Project
-------------------

.. code-block:: bash

   curl -X POST "$BASE_URL/api/projects" \
     -H "Authorization: Bearer $API_KEY" \
     -H "Content-Type: application/json" \
     -d '{"name": "Churn Prediction", "description": "Telco customer churn"}'

.. code-block:: python

   import requests

   BASE_URL = "http://localhost:8888"
   HEADERS = {"Authorization": "Bearer cpx_live.your-api-key"}

   resp = requests.post(f"{BASE_URL}/api/projects", headers=HEADERS, json={
       "name": "Churn Prediction",
       "description": "Telco customer churn",
   })
   project = resp.json()
   project_id = project["project_id"]

Save the ``project_id`` from the response -- you'll need it for every
subsequent call.

----

2. Upload a Dataset
-------------------

Upload a CSV file using ``multipart/form-data``:

.. code-block:: bash

   curl -X POST "$BASE_URL/api/datasets/upload" \
     -H "Authorization: Bearer $API_KEY" \
     -F "project_id=PROJECT_ID" \
     -F "name=Customer Churn Data" \
     -F "description=10k records of telco customers" \
     -F "file=@customer_churn.csv"

.. code-block:: python

   with open("customer_churn.csv", "rb") as f:
       resp = requests.post(
           f"{BASE_URL}/api/datasets/upload",
           headers=HEADERS,
           files={"file": ("customer_churn.csv", f, "text/csv")},
           data={
               "project_id": project_id,
               "name": "Customer Churn Data",
               "description": "10k records of telco customers",
           },
       )
   ds = resp.json()
   dataset_id = ds["id"]
   version_id = ds["version_id"]

The platform automatically detects column types, computes statistics,
and creates an immutable dataset version.

----

3. Inspect the Schema
---------------------

Check what columns were detected:

.. code-block:: bash

   curl "$BASE_URL/api/dataset-versions/$VERSION_ID" \
     -H "Authorization: Bearer $API_KEY"

.. code-block:: python

   resp = requests.get(
       f"{BASE_URL}/api/dataset-versions/{version_id}",
       headers=HEADERS,
   )
   for col in resp.json().get("columns", []):
       print(f"  {col['name']}: {col['dtype']}")

----

4. Run an AutoML Experiment
---------------------------

Launch H2O AutoML to train and rank multiple model types:

.. code-block:: bash

   curl -X POST "$BASE_URL/api/experiments" \
     -H "Authorization: Bearer $API_KEY" \
     -H "Content-Type: application/json" \
     -d '{
       "project_id": "PROJECT_ID",
       "dataset_version_id": "VERSION_ID",
       "target_column": "churned",
       "problem_type": "classification",
       "name": "Churn Classifier v1",
       "config": {
         "max_models": 10,
         "max_runtime_secs": 300
       }
     }'

.. code-block:: python

   resp = requests.post(f"{BASE_URL}/api/experiments", headers=HEADERS, json={
       "project_id": project_id,
       "dataset_version_id": version_id,
       "target_column": "churned",
       "problem_type": "classification",
       "name": "Churn Classifier v1",
       "config": {
           "max_models": 10,
           "max_runtime_secs": 300,
       },
   })
   exp = resp.json()
   experiment_id = exp["id"]

Training runs as a background job. The response includes a ``status``
field (``pending`` → ``running`` → ``completed``).

----

5. Poll for Completion
----------------------

.. code-block:: bash

   # Poll until status is "completed" or "failed"
   curl "$BASE_URL/api/experiments/$EXPERIMENT_ID" \
     -H "Authorization: Bearer $API_KEY"

.. code-block:: python

   import time

   while True:
       resp = requests.get(
           f"{BASE_URL}/api/experiments/{experiment_id}",
           headers=HEADERS,
       )
       status = resp.json()["experiment"]["status"]
       print(f"Status: {status}")
       if status in ("completed", "failed"):
           break
       time.sleep(10)

Or use the SDK which handles polling automatically:

.. code-block:: python

   from coreplexml import CorePlexMLClient

   client = CorePlexMLClient(base_url=BASE_URL, api_key="cpx_live.your-key")
   result = client.experiments.wait(experiment_id, interval=5.0, timeout=3600.0)

----

6. List Trained Models
----------------------

After training completes, list the ranked models:

.. code-block:: bash

   curl "$BASE_URL/api/experiments/$EXPERIMENT_ID/models" \
     -H "Authorization: Bearer $API_KEY"

.. code-block:: python

   resp = requests.get(
       f"{BASE_URL}/api/experiments/{experiment_id}/models",
       headers=HEADERS,
   )
   models = resp.json()["items"]
   for m in models[:5]:
       print(f"  {m['algorithm']}: AUC={m.get('metrics', {}).get('auc', 'N/A')}")

   best_model_id = models[0]["id"]

Models are sorted by the primary metric (AUC for classification, RMSE
for regression).

----

7. Make a Prediction
--------------------

Use the best model directly:

.. code-block:: bash

   curl -X POST "$BASE_URL/api/models/$MODEL_ID/predict" \
     -H "Authorization: Bearer $API_KEY" \
     -H "Content-Type: application/json" \
     -d '{
       "inputs": {
         "tenure_months": 12,
         "monthly_charges": 65.0,
         "contract": "Month-to-month",
         "payment_method": "Electronic check"
       }
     }'

.. code-block:: python

   resp = requests.post(
       f"{BASE_URL}/api/models/{best_model_id}/predict",
       headers=HEADERS,
       json={
           "inputs": {
               "tenure_months": 12,
               "monthly_charges": 65.0,
               "contract": "Month-to-month",
               "payment_method": "Electronic check",
           },
       },
   )
   print(resp.json())

----

8. Deploy to Production
-----------------------

Deploy the model as a persistent REST endpoint:

.. code-block:: bash

   curl -X POST "$BASE_URL/api/mlops/projects/$PROJECT_ID/deployments" \
     -H "Authorization: Bearer $API_KEY" \
     -H "Content-Type: application/json" \
     -d '{
       "model_id": "MODEL_ID",
       "name": "Churn Predictor",
       "stage": "production"
     }'

.. code-block:: python

   resp = requests.post(
       f"{BASE_URL}/api/mlops/projects/{project_id}/deployments",
       headers=HEADERS,
       json={
           "model_id": best_model_id,
           "name": "Churn Predictor",
           "stage": "production",
       },
   )
   deployment_id = resp.json()["id"]

Once deployed, make predictions through the deployment endpoint:

.. code-block:: bash

   curl -X POST "$BASE_URL/api/mlops/projects/$PROJECT_ID/deployments/$DEPLOYMENT_ID/predict" \
     -H "Authorization: Bearer $API_KEY" \
     -H "Content-Type: application/json" \
     -d '{"inputs": {"tenure_months": 48, "monthly_charges": 95.0, "contract": "Two year"}}'

----

Next Steps
----------

- :doc:`/user-guide/ml-pipeline` -- Deep dive into ML pipeline patterns
- :doc:`/user-guide/privacy-suite` -- Scan and protect sensitive data
- :doc:`/user-guide/synthetic-data` -- Generate privacy-safe synthetic datasets
- :doc:`/user-guide/mlops` -- Model monitoring, alerts, and auto-retraining
- :doc:`/sdk/quickstart` -- Full SDK tutorial with the same workflow
- :doc:`/api-reference/index` -- Complete API reference