============ SynthGen API ============ SynthGen generates synthetic tabular data that preserves the statistical properties of your original dataset. It supports four model architectures: CTGAN, CopulaGAN, TVAE, and Gaussian Copula. All endpoints are prefixed with ``/api/synthgen``. .. contents:: Endpoints :local: :depth: 1 ---- Create Model ------------ .. code-block:: text POST /api/synthgen/models Create a new SynthGen model and enqueue a ``synthgen_train`` background job. Training begins automatically after the job is picked up by the worker. **Request Body** .. list-table:: :header-rows: 1 :widths: 25 10 10 55 * - Field - Type - Required - Description * - ``project_id`` - string - Yes - UUID of the project. * - ``dataset_version_id`` - string - Yes - UUID of the dataset version to train on. * - ``name`` - string - Yes - Model name. * - ``model_type`` - string - No - Model architecture: ``ctgan`` (default), ``copulagan``, ``tvae``, or ``gaussian_copula``. * - ``config`` - object - No - Architecture-specific hyperparameters (e.g. ``epochs``, ``batch_size``, ``embedding_dim``). **Example** .. code-block:: bash curl -X POST "$BASE_URL/api/synthgen/models" \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "project_id": "d4e5f6a7-b8c9-0123-def4-567890123456", "dataset_version_id": "f6a7b8c9-d0e1-2345-6789-0abcdef12345", "name": "Transactions CTGAN", "model_type": "ctgan", "config": { "epochs": 300, "batch_size": 500 } }' .. code-block:: python import requests import time resp = requests.post(f"{BASE_URL}/api/synthgen/models", headers={ "Authorization": "Bearer YOUR_API_KEY", }, json={ "project_id": "d4e5f6a7-b8c9-0123-def4-567890123456", "dataset_version_id": "f6a7b8c9-d0e1-2345-6789-0abcdef12345", "name": "Transactions CTGAN", "model_type": "ctgan", "config": {"epochs": 300, "batch_size": 500}, }) model_id = resp.json()["model_id"] print("Model training started:", model_id) # Poll until ready while True: r = requests.get(f"{BASE_URL}/api/synthgen/models/{model_id}", headers={ "Authorization": "Bearer YOUR_API_KEY", }) status = r.json()["model"]["status"] print(f" Status: {status}") if status in ("ready", "failed"): break time.sleep(10) **Response** ``201 Created`` .. code-block:: json { "id": "5e6f7a8b-9c0d-1234-ef56-789012345678", "model_id": "5e6f7a8b-9c0d-1234-ef56-789012345678", "job_id": "6f7a8b9c-0d1e-2345-f678-901234567890", "status": "pending" } Model Configuration Reference ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Each model type supports specific configuration options passed in the ``config`` object: **CTGAN (Conditional Tabular GAN)** .. list-table:: :header-rows: 1 :widths: 25 10 15 50 * - Option - Type - Default - Description * - ``epochs`` - integer - 300 - Number of training epochs. * - ``batch_size`` - integer - 500 - Training batch size. * - ``embedding_dim`` - integer - 128 - Dimensionality of data embeddings. * - ``generator_dim`` - array[int] - [256, 256] - Hidden layer sizes for the generator network. * - ``discriminator_dim`` - array[int] - [256, 256] - Hidden layer sizes for the discriminator network. * - ``pac`` - integer - 10 - PAC (Packing) size. Must evenly divide ``batch_size``. **CopulaGAN** Same options as CTGAN. Additionally captures complex column correlations using copula functions. **TVAE (Tabular VAE)** .. list-table:: :header-rows: 1 :widths: 25 10 15 50 * - Option - Type - Default - Description * - ``epochs`` - integer - 300 - Number of training epochs. * - ``batch_size`` - integer - 500 - Training batch size. * - ``embedding_dim`` - integer - 128 - Dimensionality of data embeddings. **Gaussian Copula** Fastest model type. No deep learning -- uses statistical copula fitting. Minimal configuration required: .. list-table:: :header-rows: 1 :widths: 25 10 15 50 * - Option - Type - Default - Description * - ``seed`` - integer - None - Random seed for reproducibility. **Model Type Comparison** .. list-table:: :header-rows: 1 :widths: 20 15 15 15 35 * - Type - Speed - Quality - Best For - Limitations * - ``ctgan`` - Slow - High - Mixed column types, general use - Requires GPU for large datasets * - ``copulagan`` - Slow - High - Complex correlations - Higher memory usage * - ``tvae`` - Medium - Medium-High - Large datasets, fast iteration - May miss complex correlations * - ``gaussian_copula`` - Fast - Medium - Quick prototyping, baselines - Assumes Gaussian distributions ---- List Models ----------- .. code-block:: text GET /api/synthgen/models **Query Parameters** .. list-table:: :header-rows: 1 :widths: 20 10 10 60 * - Parameter - Type - Default - Description * - ``project_id`` - string - -- - Filter by project. * - ``limit`` - integer - 50 - Max items. * - ``offset`` - integer - 0 - Pagination offset. **Example** .. code-block:: bash curl "$BASE_URL/api/synthgen/models?project_id=d4e5f6a7-b8c9-0123-def4-567890123456" \ -H "Authorization: Bearer YOUR_API_KEY" **Response** ``200 OK`` .. code-block:: json { "items": [ { "id": "5e6f7a8b-9c0d-1234-ef56-789012345678", "name": "Transactions CTGAN", "model_type": "ctgan", "status": "ready", "dataset_name": "Transactions Q4", "dataset_version": 0, "created_at": "2026-02-25T10:00:00Z" } ], "total": 1, "limit": 50, "offset": 0 } ---- Get Model Detail ---------------- .. code-block:: text GET /api/synthgen/models/{model_id} Return model metadata and its associated training/generation jobs. **Example** .. code-block:: bash curl "$BASE_URL/api/synthgen/models/5e6f7a8b-9c0d-1234-ef56-789012345678" \ -H "Authorization: Bearer YOUR_API_KEY" **Response** ``200 OK`` .. code-block:: json { "model": { "id": "5e6f7a8b-9c0d-1234-ef56-789012345678", "name": "Transactions CTGAN", "model_type": "ctgan", "status": "ready", "config": {"epochs": 300, "batch_size": 500}, "dataset_name": "Transactions Q4", "dataset_version": 0, "dataset_id": "e5f6a7b8-c9d0-1234-5678-90abcdef1234", "created_at": "2026-02-25T10:00:00Z" }, "jobs": [ { "id": "7a8b9c0d-1e2f-3456-7890-123456789012", "job_type": "train", "status": "completed", "created_at": "2026-02-25T10:00:00Z" } ] } ---- Generate Synthetic Data ----------------------- .. code-block:: text POST /api/synthgen/models/{model_id}/generate Generate synthetic data from a trained model. Enqueues a ``synthgen_generate`` background job. The model must be in ``ready`` status. **Request Body** .. list-table:: :header-rows: 1 :widths: 20 10 10 60 * - Field - Type - Required - Description * - ``num_rows`` - integer - No - Number of synthetic rows to generate (default 1000). * - ``seed`` - integer - No - Random seed for reproducibility. * - ``conditions`` - object - No - Conditional generation constraints (column-value pairs). **Example** .. code-block:: bash curl -X POST "$BASE_URL/api/synthgen/models/5e6f7a8b-9c0d-1234-ef56-789012345678/generate" \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "num_rows": 10000, "seed": 42 }' .. code-block:: python # Generate synthetic data resp = requests.post( f"{BASE_URL}/api/synthgen/models/5e6f7a8b-9c0d-1234-ef56-789012345678/generate", headers={"Authorization": "Bearer YOUR_API_KEY"}, json={"num_rows": 10000, "seed": 42}, ) synthgen_job_id = resp.json()["synthgen_job_id"] # Poll until generation completes while True: r = requests.get( f"{BASE_URL}/api/synthgen/jobs/{synthgen_job_id}", headers={"Authorization": "Bearer YOUR_API_KEY"}, ) status = r.json()["status"] if status in ("completed", "failed"): break time.sleep(5) # Download the generated data dl = requests.get( f"{BASE_URL}/api/synthgen/jobs/{synthgen_job_id}/download", headers={"Authorization": "Bearer YOUR_API_KEY"}, ) with open("synthetic_data.csv", "wb") as f: f.write(dl.content) **Response** ``200 OK`` .. code-block:: json { "job_id": "8b9c0d1e-2f3a-4567-8901-234567890123", "synthgen_job_id": "9c0d1e2f-3a4b-5678-9012-345678901234", "status": "pending" } ---- Delete Model ------------ .. code-block:: text DELETE /api/synthgen/models/{model_id} Permanently delete a SynthGen model and its associated jobs. **Example** .. code-block:: bash curl -X DELETE "$BASE_URL/api/synthgen/models/5e6f7a8b-9c0d-1234-ef56-789012345678" \ -H "Authorization: Bearer YOUR_API_KEY" **Response** ``200 OK`` .. code-block:: json { "ok": true, "model_id": "5e6f7a8b-9c0d-1234-ef56-789012345678" } ---- List Jobs --------- .. code-block:: text GET /api/synthgen/jobs Return SynthGen jobs (training and generation) with optional filters. **Query Parameters** .. list-table:: :header-rows: 1 :widths: 20 10 10 60 * - Parameter - Type - Default - Description * - ``project_id`` - string - -- - Filter by project. * - ``model_id`` - string - -- - Filter by SynthGen model. * - ``status`` - string - -- - Filter by status: ``pending``, ``running``, ``completed``, ``failed``. * - ``limit`` - integer - 50 - Max items. * - ``offset`` - integer - 0 - Pagination offset. **Example** .. code-block:: bash curl "$BASE_URL/api/synthgen/jobs?model_id=5e6f7a8b-9c0d-1234-ef56-789012345678" \ -H "Authorization: Bearer YOUR_API_KEY" **Response** ``200 OK`` .. code-block:: json { "items": [ { "id": "9c0d1e2f-3a4b-5678-9012-345678901234", "model_id": "5e6f7a8b-9c0d-1234-ef56-789012345678", "model_name": "Transactions CTGAN", "model_type": "ctgan", "job_type": "generate", "status": "completed", "config": {"num_rows": 10000, "seed": 42}, "created_at": "2026-02-25T11:00:00Z" } ], "total": 1, "limit": 50, "offset": 0 } ---- Get Job Detail -------------- .. code-block:: text GET /api/synthgen/jobs/{job_id} Return details for a specific SynthGen job. **Example** .. code-block:: bash curl "$BASE_URL/api/synthgen/jobs/9c0d1e2f-3a4b-5678-9012-345678901234" \ -H "Authorization: Bearer YOUR_API_KEY" **Response** ``200 OK`` .. code-block:: json { "id": "9c0d1e2f-3a4b-5678-9012-345678901234", "model_id": "5e6f7a8b-9c0d-1234-ef56-789012345678", "model_name": "Transactions CTGAN", "model_type": "ctgan", "project_name": "Fraud Detection v2", "job_type": "generate", "status": "completed", "config": {"num_rows": 10000, "seed": 42}, "output_artifact_id": "0d1e2f3a-4b5c-6789-0123-456789012345", "created_at": "2026-02-25T11:00:00Z" } ---- Download Synthetic Data ----------------------- .. code-block:: text GET /api/synthgen/jobs/{job_id}/download Download the output artifact from a completed generation job. Returns ``409 Conflict`` if the job output is not available yet. **Example** .. code-block:: bash curl -o synthetic_data.csv \ "$BASE_URL/api/synthgen/jobs/9c0d1e2f-3a4b-5678-9012-345678901234/download" \ -H "Authorization: Bearer YOUR_API_KEY" .. code-block:: python resp = requests.get( f"{BASE_URL}/api/synthgen/jobs/9c0d1e2f-3a4b-5678-9012-345678901234/download", headers={"Authorization": "Bearer YOUR_API_KEY"}, ) with open("synthetic_data.csv", "wb") as f: f.write(resp.content) **Response** ``200 OK`` Binary file download. ---- List Model Types ---------------- .. code-block:: text GET /api/synthgen/model-types Return available model architectures and their descriptions. **Example** .. code-block:: bash curl "$BASE_URL/api/synthgen/model-types" \ -H "Authorization: Bearer YOUR_API_KEY" **Response** ``200 OK`` .. code-block:: json { "model_types": [ { "id": "ctgan", "name": "CTGAN", "description": "Conditional Tabular GAN - best for general tabular data" }, { "id": "copulagan", "name": "CopulaGAN", "description": "Copula-based GAN - good for capturing correlations" }, { "id": "tvae", "name": "TVAE", "description": "Tabular VAE - faster training, good for large datasets" }, { "id": "gaussian_copula", "name": "Gaussian Copula", "description": "Statistical copula model - fastest, good baseline" } ] } ---- Full Workflow Example --------------------- Train a CTGAN model and generate 10,000 synthetic rows: .. code-block:: python import requests import time BASE_URL = "http://localhost:8888" HEADERS = {"Authorization": "Bearer YOUR_API_KEY"} # 1. Train a CTGAN model resp = requests.post(f"{BASE_URL}/api/synthgen/models", headers=HEADERS, json={ "project_id": "d4e5f6a7-b8c9-0123-def4-567890123456", "dataset_version_id": "f6a7b8c9-d0e1-2345-6789-0abcdef12345", "name": "Transactions CTGAN v1", "model_type": "ctgan", "config": {"epochs": 300}, }) model_id = resp.json()["model_id"] # 2. Wait for training to complete while True: r = requests.get(f"{BASE_URL}/api/synthgen/models/{model_id}", headers=HEADERS) if r.json()["model"]["status"] in ("ready", "failed"): break time.sleep(15) # 3. Generate synthetic data resp = requests.post( f"{BASE_URL}/api/synthgen/models/{model_id}/generate", headers=HEADERS, json={"num_rows": 10000, "seed": 42}, ) job_id = resp.json()["synthgen_job_id"] # 4. Wait for generation to complete while True: r = requests.get(f"{BASE_URL}/api/synthgen/jobs/{job_id}", headers=HEADERS) if r.json()["status"] in ("completed", "failed"): break time.sleep(5) # 5. Download the result resp = requests.get( f"{BASE_URL}/api/synthgen/jobs/{job_id}/download", headers=HEADERS, ) with open("synthetic_transactions.csv", "wb") as f: f.write(resp.content) print(f"Generated {10000} synthetic rows.") ---- .. seealso:: - :doc:`datasets` -- Uploading real datasets for training. - :doc:`reports` -- Generating SynthGen quality reports.