Quick Start

This guide walks through a complete ML pipeline using the CorePlexML REST API. You will create a project, upload data, train models, deploy the best one, and make predictions – all via HTTP calls.

Prerequisites 

A running CorePlexML instance (e.g., http://localhost:8888)
An API key (see Authentication)
A CSV dataset file

Set up shell variables used throughout this guide:

export BASE_URL="http://localhost:8888"
export API_KEY="cpx_live.your-api-key-here"

1. Create a Project 

curl -X POST "$BASE_URL/api/projects" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"name": "Churn Prediction", "description": "Telco customer churn"}'

import requests

BASE_URL = "http://localhost:8888"
HEADERS = {"Authorization": "Bearer cpx_live.your-api-key"}

resp = requests.post(f"{BASE_URL}/api/projects", headers=HEADERS, json={
    "name": "Churn Prediction",
    "description": "Telco customer churn",
})
project = resp.json()
project_id = project["project_id"]

Save the project_id from the response – you’ll need it for every subsequent call.

2. Upload a Dataset 

Upload a CSV file using multipart/form-data:

curl -X POST "$BASE_URL/api/datasets/upload" \
  -H "Authorization: Bearer $API_KEY" \
  -F "project_id=PROJECT_ID" \
  -F "name=Customer Churn Data" \
  -F "description=10k records of telco customers" \
  -F "file=@customer_churn.csv"

with open("customer_churn.csv", "rb") as f:
    resp = requests.post(
        f"{BASE_URL}/api/datasets/upload",
        headers=HEADERS,
        files={"file": ("customer_churn.csv", f, "text/csv")},
        data={
            "project_id": project_id,
            "name": "Customer Churn Data",
            "description": "10k records of telco customers",
        },
    )
ds = resp.json()
dataset_id = ds["id"]
version_id = ds["version_id"]

The platform automatically detects column types, computes statistics, and creates an immutable dataset version.

3. Inspect the Schema 

Check what columns were detected:

curl "$BASE_URL/api/dataset-versions/$VERSION_ID" \
  -H "Authorization: Bearer $API_KEY"

resp = requests.get(
    f"{BASE_URL}/api/dataset-versions/{version_id}",
    headers=HEADERS,
)
for col in resp.json().get("columns", []):
    print(f"  {col['name']}: {col['dtype']}")

4. Run an AutoML Experiment 

Launch H2O AutoML to train and rank multiple model types:

curl -X POST "$BASE_URL/api/experiments" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "project_id": "PROJECT_ID",
    "dataset_version_id": "VERSION_ID",
    "target_column": "churned",
    "problem_type": "classification",
    "name": "Churn Classifier v1",
    "config": {
      "max_models": 10,
      "max_runtime_secs": 300
    }
  }'

resp = requests.post(f"{BASE_URL}/api/experiments", headers=HEADERS, json={
    "project_id": project_id,
    "dataset_version_id": version_id,
    "target_column": "churned",
    "problem_type": "classification",
    "name": "Churn Classifier v1",
    "config": {
        "max_models": 10,
        "max_runtime_secs": 300,
    },
})
exp = resp.json()
experiment_id = exp["id"]

Training runs as a background job. The response includes a status field (pending → running → completed).

5. Poll for Completion 

# Poll until status is "completed" or "failed"
curl "$BASE_URL/api/experiments/$EXPERIMENT_ID" \
  -H "Authorization: Bearer $API_KEY"

import time

while True:
    resp = requests.get(
        f"{BASE_URL}/api/experiments/{experiment_id}",
        headers=HEADERS,
    )
    status = resp.json()["experiment"]["status"]
    print(f"Status: {status}")
    if status in ("completed", "failed"):
        break
    time.sleep(10)

Or use the SDK which handles polling automatically:

from coreplexml import CorePlexMLClient

client = CorePlexMLClient(base_url=BASE_URL, api_key="cpx_live.your-key")
result = client.experiments.wait(experiment_id, interval=5.0, timeout=3600.0)

6. List Trained Models 

After training completes, list the ranked models:

curl "$BASE_URL/api/experiments/$EXPERIMENT_ID/models" \
  -H "Authorization: Bearer $API_KEY"

resp = requests.get(
    f"{BASE_URL}/api/experiments/{experiment_id}/models",
    headers=HEADERS,
)
models = resp.json()["items"]
for m in models[:5]:
    print(f"  {m['algorithm']}: AUC={m.get('metrics', {}).get('auc', 'N/A')}")

best_model_id = models[0]["id"]

Models are sorted by the primary metric (AUC for classification, RMSE for regression).

7. Make a Prediction 

Use the best model directly:

curl -X POST "$BASE_URL/api/models/$MODEL_ID/predict" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "inputs": {
      "tenure_months": 12,
      "monthly_charges": 65.0,
      "contract": "Month-to-month",
      "payment_method": "Electronic check"
    }
  }'

resp = requests.post(
    f"{BASE_URL}/api/models/{best_model_id}/predict",
    headers=HEADERS,
    json={
        "inputs": {
            "tenure_months": 12,
            "monthly_charges": 65.0,
            "contract": "Month-to-month",
            "payment_method": "Electronic check",
        },
    },
)
print(resp.json())

8. Deploy to Production 

Deploy the model as a persistent REST endpoint:

curl -X POST "$BASE_URL/api/mlops/projects/$PROJECT_ID/deployments" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model_id": "MODEL_ID",
    "name": "Churn Predictor",
    "stage": "production"
  }'

resp = requests.post(
    f"{BASE_URL}/api/mlops/projects/{project_id}/deployments",
    headers=HEADERS,
    json={
        "model_id": best_model_id,
        "name": "Churn Predictor",
        "stage": "production",
    },
)
deployment_id = resp.json()["id"]

Once deployed, make predictions through the deployment endpoint:

curl -X POST "$BASE_URL/api/mlops/projects/$PROJECT_ID/deployments/$DEPLOYMENT_ID/predict" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"inputs": {"tenure_months": 48, "monthly_charges": 95.0, "contract": "Two year"}}'

Next Steps 

End-to-End ML Pipeline – Deep dive into ML pipeline patterns
Privacy Suite – Scan and protect sensitive data
Synthetic Data Generation – Generate privacy-safe synthetic datasets
MLOps & Model Serving – Model monitoring, alerts, and auto-retraining
Quick Start – Full SDK tutorial with the same workflow
API Reference – Complete API reference