Example Scripts

The following example scripts demonstrate common CorePlexML SDK workflows. Each script is self-contained, uses argparse for configuration, and can be run directly against a CorePlexML instance.

All examples are located in the docs-site/examples/ directory.

01 – Quick Start 

End-to-end ML workflow: create a project, upload data, train a classifier, deploy to staging, and make predictions.

python 01_quickstart.py --base-url https://your-domain.com \
    --api-key YOUR_KEY --csv data.csv --target churned

"""CorePlexML Quick Start -- End-to-end ML workflow.

Creates a project, uploads a dataset, trains a classification model,
deploys it to staging, makes predictions, and cleans up.

Usage:
    python 01_quickstart.py --base-url https://your-domain.com --api-key YOUR_KEY --csv data.csv
"""

import argparse
import sys

from coreplexml import CorePlexMLClient, CorePlexMLError


def main():
    parser = argparse.ArgumentParser(description="CorePlexML Quick Start")
    parser.add_argument("--base-url", default="http://localhost:8888", help="CorePlexML server URL")
    parser.add_argument("--api-key", required=True, help="API key for authentication")
    parser.add_argument("--csv", required=True, help="Path to the training CSV file")
    parser.add_argument("--target", default="target", help="Target column name (default: target)")
    args = parser.parse_args()

    client = CorePlexMLClient(base_url=args.base_url, api_key=args.api_key, timeout=120)
    print("Connected to CorePlexML")

    try:
        # Step 1: Create a project
        project = client.projects.create("Quick Start Project", description="SDK quick start demo")
        project_id = project["id"]
        print(f"[1/8] Created project: {project_id}")

        # Step 2: Upload the dataset
        ds = client.datasets.upload(project_id, args.csv, "Training Data")
        dataset_id = ds["id"]
        version_id = ds["version_id"]
        print(f"[2/8] Uploaded dataset: {dataset_id} (version {version_id})")

        # Step 3: Inspect columns
        cols = client.datasets.columns(dataset_id)
        col_names = [c["name"] for c in cols["columns"]]
        print(f"[3/8] Detected {len(col_names)} columns: {', '.join(col_names[:5])}...")

        # Step 4: Create an experiment
        exp = client.experiments.create(
            project_id=project_id,
            dataset_version_id=version_id,
            target_column=args.target,
            name="Quick Start Classifier",
            problem_type="classification",
            config={"max_models": 5, "max_runtime_secs": 120},
        )
        experiment_id = exp["id"]
        print(f"[4/8] Started experiment: {experiment_id}")

        # Step 5: Wait for training
        print("       Waiting for training to complete...")
        result = client.experiments.wait(experiment_id, interval=5.0, timeout=1800.0)
        print(f"[5/8] Experiment status: {result['status']}")

        if result["status"] != "succeeded":
            print(f"Training did not succeed: {result.get('error', 'unknown')}")
            sys.exit(1)

        # Step 6: Get the best model
        models = client.models.list(experiment_id=experiment_id)
        best = models["items"][0]
        model_id = best["id"]
        print(f"[6/8] Best model: {best.get('algorithm', 'N/A')} ({model_id})")

        # Step 7: Deploy to staging
        dep = client.deployments.create(
            project_id=project_id,
            model_id=model_id,
            name="Quick Start Deployment",
            stage="staging",
        )
        deployment_id = dep["id"]
        print(f"[7/8] Deployed to staging: {deployment_id}")

        # Step 8: Make a prediction
        sample_input = {name: 0 for name in col_names if name != args.target}
        pred = client.deployments.predict(deployment_id, inputs=sample_input)
        print(f"[8/8] Prediction: {pred.get('prediction', pred)}")

        print("\nQuick start complete!")

        # Cleanup
        client.deployments.deactivate(deployment_id)
        client.projects.delete(project_id)
        print("Cleaned up resources.")

    except CorePlexMLError as e:
        print(f"Error ({e.status_code}): {e.message}")
        sys.exit(1)


if __name__ == "__main__":
    main()

02 – Batch Predictions 

Read rows from a CSV, send them in configurable batches to a deployment endpoint, and write predictions to an output file.

python 02_batch_predictions.py --base-url https://your-domain.com \
    --api-key YOUR_KEY --deployment-id DEPLOY_UUID \
    --input new_data.csv --output predictions.csv --batch-size 500

"""Batch predictions with deployed models.

Reads rows from a CSV file, sends them in batches to a deployment endpoint,
and writes predictions to an output CSV.

Usage:
    python 02_batch_predictions.py --base-url https://your-domain.com \
        --api-key YOUR_KEY --deployment-id DEPLOY_UUID --input data.csv --output predictions.csv
"""

import argparse
import csv
import sys

from coreplexml import CorePlexMLClient, CorePlexMLError


def chunked(items, size):
    """Yield successive chunks of the given size."""
    for i in range(0, len(items), size):
        yield items[i : i + size]


def main():
    parser = argparse.ArgumentParser(description="Batch predictions")
    parser.add_argument("--base-url", default="http://localhost:8888")
    parser.add_argument("--api-key", required=True)
    parser.add_argument("--deployment-id", required=True, help="UUID of the deployment")
    parser.add_argument("--input", required=True, help="Path to input CSV")
    parser.add_argument("--output", default="predictions.csv", help="Path to output CSV")
    parser.add_argument("--batch-size", type=int, default=500, help="Rows per batch (default: 500)")
    args = parser.parse_args()

    client = CorePlexMLClient(base_url=args.base_url, api_key=args.api_key, timeout=120)

    # Verify the deployment exists
    try:
        dep = client.deployments.get(args.deployment_id)
        print(f"Deployment: {dep.get('name', args.deployment_id)} (stage={dep.get('stage', 'unknown')})")
    except CorePlexMLError as e:
        print(f"Cannot access deployment: {e.message}")
        sys.exit(1)

    # Read input CSV
    with open(args.input, newline="") as f:
        reader = csv.DictReader(f)
        rows = list(reader)
    print(f"Loaded {len(rows)} rows from {args.input}")

    if not rows:
        print("No rows to predict.")
        sys.exit(0)

    # Run batch predictions
    all_predictions = []
    for i, batch in enumerate(chunked(rows, args.batch_size)):
        try:
            result = client.deployments.predict(args.deployment_id, inputs=batch)
            preds = result.get("predictions", [])
            all_predictions.extend(preds)
            processed = min((i + 1) * args.batch_size, len(rows))
            print(f"  Batch {i + 1}: {processed}/{len(rows)} rows processed")
        except CorePlexMLError as e:
            print(f"  Batch {i + 1} failed: {e.message}")
            # Fill with error markers so row count stays aligned
            all_predictions.extend([{"prediction": "ERROR"}] * len(batch))

    # Write output CSV
    fieldnames = list(rows[0].keys()) + ["prediction", "probability"]
    with open(args.output, "w", newline="") as f:
        writer = csv.DictWriter(f, fieldnames=fieldnames)
        writer.writeheader()
        for row, pred in zip(rows, all_predictions):
            row["prediction"] = pred.get("prediction", "")
            row["probability"] = pred.get("probability", pred.get("probabilities", ""))
            writer.writerow(row)

    print(f"\nWrote {len(all_predictions)} predictions to {args.output}")


if __name__ == "__main__":
    main()

03 – Experiment Comparison 

Run multiple experiments with different AutoML configurations on the same dataset, wait for all to complete, and print a ranked comparison.

python 03_experiment_comparison.py --base-url https://your-domain.com \
    --api-key YOUR_KEY --project-id PROJ_UUID --version-id VER_UUID \
    --target churned --problem-type classification

"""Compare multiple experiments and find the best model.

Runs several experiments with different configurations on the same dataset,
waits for all to complete, and prints a ranked comparison of the best model
from each experiment.

Usage:
    python 03_experiment_comparison.py --base-url https://your-domain.com \
        --api-key YOUR_KEY --project-id PROJ_UUID --version-id VER_UUID --target churned
"""

import argparse
import sys

from coreplexml import CorePlexMLClient, CorePlexMLError


EXPERIMENT_CONFIGS = [
    {"name": "Fast (5 models, 60s)", "max_models": 5, "max_runtime_secs": 60},
    {"name": "Standard (10 models, 180s)", "max_models": 10, "max_runtime_secs": 180},
    {"name": "Thorough (20 models, 300s)", "max_models": 20, "max_runtime_secs": 300},
]


def main():
    parser = argparse.ArgumentParser(description="Experiment comparison")
    parser.add_argument("--base-url", default="http://localhost:8888")
    parser.add_argument("--api-key", required=True)
    parser.add_argument("--project-id", required=True, help="UUID of the project")
    parser.add_argument("--version-id", required=True, help="UUID of the dataset version")
    parser.add_argument("--target", required=True, help="Target column name")
    parser.add_argument("--problem-type", default="classification", choices=["classification", "regression"])
    args = parser.parse_args()

    client = CorePlexMLClient(base_url=args.base_url, api_key=args.api_key, timeout=60)

    # Launch all experiments
    experiments = []
    for cfg in EXPERIMENT_CONFIGS:
        try:
            exp = client.experiments.create(
                project_id=args.project_id,
                dataset_version_id=args.version_id,
                target_column=args.target,
                name=cfg["name"],
                problem_type=args.problem_type,
                config={"max_models": cfg["max_models"], "max_runtime_secs": cfg["max_runtime_secs"]},
            )
            experiments.append({"id": exp["id"], "name": cfg["name"]})
            print(f"Started: {cfg['name']} ({exp['id']})")
        except CorePlexMLError as e:
            print(f"Failed to start '{cfg['name']}': {e.message}")

    if not experiments:
        print("No experiments started.")
        sys.exit(1)

    # Wait for all experiments
    print("\nWaiting for all experiments to complete...")
    results = []
    for exp in experiments:
        try:
            status = client.experiments.wait(exp["id"], interval=10.0, timeout=3600.0)
            exp["status"] = status.get("status", "unknown")
            print(f"  {exp['name']}: {exp['status']}")
        except CorePlexMLError as e:
            exp["status"] = "timeout"
            print(f"  {exp['name']}: timed out ({e.message})")

    # Collect best model from each successful experiment
    print("\n--- Results ---")
    best_overall = None
    metric_key = "auc" if args.problem_type == "classification" else "rmse"

    for exp in experiments:
        if exp["status"] != "succeeded":
            print(f"  {exp['name']}: SKIPPED ({exp['status']})")
            continue
        models = client.models.list(experiment_id=exp["id"])
        if not models.get("items"):
            print(f"  {exp['name']}: No models produced")
            continue
        top = models["items"][0]
        metric_val = top.get("metrics", {}).get(metric_key, "N/A")
        print(f"  {exp['name']}: {top.get('algorithm', '?')} {metric_key}={metric_val}")

        if best_overall is None:
            best_overall = (exp, top, metric_val)
        else:
            try:
                current_best = float(best_overall[2])
                candidate = float(metric_val)
                # For AUC, higher is better; for RMSE, lower is better
                if args.problem_type == "classification" and candidate > current_best:
                    best_overall = (exp, top, metric_val)
                elif args.problem_type == "regression" and candidate < current_best:
                    best_overall = (exp, top, metric_val)
            except (ValueError, TypeError):
                pass

    if best_overall:
        exp_info, model_info, metric = best_overall
        print(f"\nBest overall: {model_info.get('algorithm')} from '{exp_info['name']}'")
        print(f"  Model ID: {model_info['id']}")
        print(f"  {metric_key}: {metric}")
    else:
        print("\nNo successful experiments to compare.")


if __name__ == "__main__":
    main()

04 – Privacy Workflow 

Create a HIPAA compliance policy, scan a dataset for PII, apply privacy transformations (masking, hashing, redaction), and retrieve results.

python 04_privacy_workflow.py --base-url https://your-domain.com \
    --api-key YOUR_KEY --project-id PROJ_UUID --dataset-id DS_UUID \
    --profile hipaa

05 – Synthetic Data Generation 

Train a SynthGen model (CTGAN/CopulaGAN/TVAE/Gaussian Copula) on a dataset version, wait for training, and generate synthetic rows with optional reproducibility seed.

python 05_synthetic_data.py --base-url https://your-domain.com \
    --api-key YOUR_KEY --project-id PROJ_UUID --version-id VER_UUID \
    --model-type ctgan --epochs 300 --num-rows 5000 --seed 42

06 – Model Monitoring 

Deploy a model, promote to production, and check for data drift.

python 06_model_monitoring.py --base-url https://your-domain.com \
    --api-key YOUR_KEY --project-id PROJ_UUID --model-id MODEL_UUID

07 – What-If Analysis 

Create a studio session with baseline inputs, define counterfactual scenarios, run predictions, and compare results side by side.

python 07_what_if_analysis.py --base-url https://your-domain.com \
    --api-key YOUR_KEY --project-id PROJ_UUID --deployment-id DEP_UUID

08 – Report Generation 

Generate experiment and deployment reports with AI insights, wait for PDF generation, and download the results.

python 08_reports.py --base-url https://your-domain.com \
    --api-key YOUR_KEY --project-id PROJ_UUID \
    --experiment-id EXP_UUID --output-dir ./reports

Running the Examples 

All examples follow the same pattern:

Install the SDK:
```
pip install coreplexml
```
Set your API key and server URL. You can pass them as arguments or export them as environment variables and modify the scripts to read from os.environ.

Run the script:

python 01_quickstart.py --base-url https://ml.example.com --api-key cp_ab12cd34.your-secret-key --csv train.csv

Each script prints progress to stdout and exits with code 0 on success or 1 on error.