SynthGen API

SynthGen generates synthetic tabular data that preserves the statistical properties of your original dataset. It supports four model architectures: CTGAN, CopulaGAN, TVAE, and Gaussian Copula.

All endpoints are prefixed with /api/synthgen.

Create Model 

POST /api/synthgen/models

Create a new SynthGen model and enqueue a synthgen_train background job. Training begins automatically after the job is picked up by the worker.

Request Body

Field	Type	Required	Description
`project_id`	string	Yes	UUID of the project.
`dataset_version_id`	string	Yes	UUID of the dataset version to train on.
`name`	string	Yes	Model name.
`model_type`	string	No	Model architecture: `ctgan` (default), `copulagan`, `tvae`, or `gaussian_copula`.
`config`	object	No	Architecture-specific hyperparameters (e.g. `epochs`, `batch_size`, `embedding_dim`).

Example

curl -X POST "$BASE_URL/api/synthgen/models" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "project_id": "d4e5f6a7-b8c9-0123-def4-567890123456",
    "dataset_version_id": "f6a7b8c9-d0e1-2345-6789-0abcdef12345",
    "name": "Transactions CTGAN",
    "model_type": "ctgan",
    "config": {
      "epochs": 300,
      "batch_size": 500
    }
  }'

import requests
import time

resp = requests.post(f"{BASE_URL}/api/synthgen/models", headers={
    "Authorization": "Bearer YOUR_API_KEY",
}, json={
    "project_id": "d4e5f6a7-b8c9-0123-def4-567890123456",
    "dataset_version_id": "f6a7b8c9-d0e1-2345-6789-0abcdef12345",
    "name": "Transactions CTGAN",
    "model_type": "ctgan",
    "config": {"epochs": 300, "batch_size": 500},
})
model_id = resp.json()["model_id"]
print("Model training started:", model_id)

# Poll until ready
while True:
    r = requests.get(f"{BASE_URL}/api/synthgen/models/{model_id}", headers={
        "Authorization": "Bearer YOUR_API_KEY",
    })
    status = r.json()["model"]["status"]
    print(f"  Status: {status}")
    if status in ("ready", "failed"):
        break
    time.sleep(10)

Response 201 Created

{
  "id": "5e6f7a8b-9c0d-1234-ef56-789012345678",
  "model_id": "5e6f7a8b-9c0d-1234-ef56-789012345678",
  "job_id": "6f7a8b9c-0d1e-2345-f678-901234567890",
  "status": "pending"
}

Model Configuration Reference

Each model type supports specific configuration options passed in the config object:

CTGAN (Conditional Tabular GAN)

Option	Type	Default	Description
`epochs`	integer	300	Number of training epochs.
`batch_size`	integer	500	Training batch size.
`embedding_dim`	integer	128	Dimensionality of data embeddings.
`generator_dim`	array[int]	[256, 256]	Hidden layer sizes for the generator network.
`discriminator_dim`	array[int]	[256, 256]	Hidden layer sizes for the discriminator network.
`pac`	integer	10	PAC (Packing) size. Must evenly divide `batch_size`.

CopulaGAN

Same options as CTGAN. Additionally captures complex column correlations using copula functions.

TVAE (Tabular VAE)

Option	Type	Default	Description
`epochs`	integer	300	Number of training epochs.
`batch_size`	integer	500	Training batch size.
`embedding_dim`	integer	128	Dimensionality of data embeddings.

Gaussian Copula

Fastest model type. No deep learning – uses statistical copula fitting. Minimal configuration required:

Option	Type	Default	Description
`seed`	integer	None	Random seed for reproducibility.

Model Type Comparison

Type	Speed	Quality	Best For	Limitations
`ctgan`	Slow	High	Mixed column types, general use	Requires GPU for large datasets
`copulagan`	Slow	High	Complex correlations	Higher memory usage
`tvae`	Medium	Medium-High	Large datasets, fast iteration	May miss complex correlations
`gaussian_copula`	Fast	Medium	Quick prototyping, baselines	Assumes Gaussian distributions

List Models 

GET /api/synthgen/models

Query Parameters

Parameter	Type	Default	Description
`project_id`	string	–	Filter by project.
`limit`	integer	50	Max items.
`offset`	integer	0	Pagination offset.

Example

curl "$BASE_URL/api/synthgen/models?project_id=d4e5f6a7-b8c9-0123-def4-567890123456" \
  -H "Authorization: Bearer YOUR_API_KEY"

Response 200 OK

{
  "items": [
    {
      "id": "5e6f7a8b-9c0d-1234-ef56-789012345678",
      "name": "Transactions CTGAN",
      "model_type": "ctgan",
      "status": "ready",
      "dataset_name": "Transactions Q4",
      "dataset_version": 0,
      "created_at": "2026-02-25T10:00:00Z"
    }
  ],
  "total": 1,
  "limit": 50,
  "offset": 0
}

Get Model Detail 

GET /api/synthgen/models/{model_id}

Return model metadata and its associated training/generation jobs.

Example

curl "$BASE_URL/api/synthgen/models/5e6f7a8b-9c0d-1234-ef56-789012345678" \
  -H "Authorization: Bearer YOUR_API_KEY"

Response 200 OK

{
  "model": {
    "id": "5e6f7a8b-9c0d-1234-ef56-789012345678",
    "name": "Transactions CTGAN",
    "model_type": "ctgan",
    "status": "ready",
    "config": {"epochs": 300, "batch_size": 500},
    "dataset_name": "Transactions Q4",
    "dataset_version": 0,
    "dataset_id": "e5f6a7b8-c9d0-1234-5678-90abcdef1234",
    "created_at": "2026-02-25T10:00:00Z"
  },
  "jobs": [
    {
      "id": "7a8b9c0d-1e2f-3456-7890-123456789012",
      "job_type": "train",
      "status": "completed",
      "created_at": "2026-02-25T10:00:00Z"
    }
  ]
}

Generate Synthetic Data 

POST /api/synthgen/models/{model_id}/generate

Generate synthetic data from a trained model. Enqueues a synthgen_generate background job. The model must be in ready status.

Request Body

Field	Type	Required	Description
`num_rows`	integer	No	Number of synthetic rows to generate (default 1000).
`seed`	integer	No	Random seed for reproducibility.
`conditions`	object	No	Conditional generation constraints (column-value pairs).

Example

curl -X POST "$BASE_URL/api/synthgen/models/5e6f7a8b-9c0d-1234-ef56-789012345678/generate" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "num_rows": 10000,
    "seed": 42
  }'

# Generate synthetic data
resp = requests.post(
    f"{BASE_URL}/api/synthgen/models/5e6f7a8b-9c0d-1234-ef56-789012345678/generate",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    json={"num_rows": 10000, "seed": 42},
)
synthgen_job_id = resp.json()["synthgen_job_id"]

# Poll until generation completes
while True:
    r = requests.get(
        f"{BASE_URL}/api/synthgen/jobs/{synthgen_job_id}",
        headers={"Authorization": "Bearer YOUR_API_KEY"},
    )
    status = r.json()["status"]
    if status in ("completed", "failed"):
        break
    time.sleep(5)

# Download the generated data
dl = requests.get(
    f"{BASE_URL}/api/synthgen/jobs/{synthgen_job_id}/download",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
)
with open("synthetic_data.csv", "wb") as f:
    f.write(dl.content)

Response 200 OK

{
  "job_id": "8b9c0d1e-2f3a-4567-8901-234567890123",
  "synthgen_job_id": "9c0d1e2f-3a4b-5678-9012-345678901234",
  "status": "pending"
}

Delete Model 

DELETE /api/synthgen/models/{model_id}

Permanently delete a SynthGen model and its associated jobs.

Example

curl -X DELETE "$BASE_URL/api/synthgen/models/5e6f7a8b-9c0d-1234-ef56-789012345678" \
  -H "Authorization: Bearer YOUR_API_KEY"

Response 200 OK

{
  "ok": true,
  "model_id": "5e6f7a8b-9c0d-1234-ef56-789012345678"
}

List Jobs 

GET /api/synthgen/jobs

Return SynthGen jobs (training and generation) with optional filters.

Query Parameters

Parameter	Type	Default	Description
`project_id`	string	–	Filter by project.
`model_id`	string	–	Filter by SynthGen model.
`status`	string	–	Filter by status: `pending`, `running`, `completed`, `failed`.
`limit`	integer	50	Max items.
`offset`	integer	0	Pagination offset.

Example

curl "$BASE_URL/api/synthgen/jobs?model_id=5e6f7a8b-9c0d-1234-ef56-789012345678" \
  -H "Authorization: Bearer YOUR_API_KEY"

Response 200 OK

{
  "items": [
    {
      "id": "9c0d1e2f-3a4b-5678-9012-345678901234",
      "model_id": "5e6f7a8b-9c0d-1234-ef56-789012345678",
      "model_name": "Transactions CTGAN",
      "model_type": "ctgan",
      "job_type": "generate",
      "status": "completed",
      "config": {"num_rows": 10000, "seed": 42},
      "created_at": "2026-02-25T11:00:00Z"
    }
  ],
  "total": 1,
  "limit": 50,
  "offset": 0
}

Get Job Detail 

GET /api/synthgen/jobs/{job_id}

Return details for a specific SynthGen job.

Example

curl "$BASE_URL/api/synthgen/jobs/9c0d1e2f-3a4b-5678-9012-345678901234" \
  -H "Authorization: Bearer YOUR_API_KEY"

Response 200 OK

{
  "id": "9c0d1e2f-3a4b-5678-9012-345678901234",
  "model_id": "5e6f7a8b-9c0d-1234-ef56-789012345678",
  "model_name": "Transactions CTGAN",
  "model_type": "ctgan",
  "project_name": "Fraud Detection v2",
  "job_type": "generate",
  "status": "completed",
  "config": {"num_rows": 10000, "seed": 42},
  "output_artifact_id": "0d1e2f3a-4b5c-6789-0123-456789012345",
  "created_at": "2026-02-25T11:00:00Z"
}

Download Synthetic Data 

GET /api/synthgen/jobs/{job_id}/download

Download the output artifact from a completed generation job.

Returns 409 Conflict if the job output is not available yet.

Example

curl -o synthetic_data.csv \
  "$BASE_URL/api/synthgen/jobs/9c0d1e2f-3a4b-5678-9012-345678901234/download" \
  -H "Authorization: Bearer YOUR_API_KEY"

resp = requests.get(
    f"{BASE_URL}/api/synthgen/jobs/9c0d1e2f-3a4b-5678-9012-345678901234/download",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
)
with open("synthetic_data.csv", "wb") as f:
    f.write(resp.content)

Response 200 OK

Binary file download.

List Model Types 

GET /api/synthgen/model-types

Return available model architectures and their descriptions.

Example

curl "$BASE_URL/api/synthgen/model-types" \
  -H "Authorization: Bearer YOUR_API_KEY"

Response 200 OK

{
  "model_types": [
    {
      "id": "ctgan",
      "name": "CTGAN",
      "description": "Conditional Tabular GAN - best for general tabular data"
    },
    {
      "id": "copulagan",
      "name": "CopulaGAN",
      "description": "Copula-based GAN - good for capturing correlations"
    },
    {
      "id": "tvae",
      "name": "TVAE",
      "description": "Tabular VAE - faster training, good for large datasets"
    },
    {
      "id": "gaussian_copula",
      "name": "Gaussian Copula",
      "description": "Statistical copula model - fastest, good baseline"
    }
  ]
}

Full Workflow Example 

Train a CTGAN model and generate 10,000 synthetic rows:

import requests
import time

BASE_URL = "http://localhost:8888"
HEADERS = {"Authorization": "Bearer YOUR_API_KEY"}

# 1. Train a CTGAN model
resp = requests.post(f"{BASE_URL}/api/synthgen/models", headers=HEADERS, json={
    "project_id": "d4e5f6a7-b8c9-0123-def4-567890123456",
    "dataset_version_id": "f6a7b8c9-d0e1-2345-6789-0abcdef12345",
    "name": "Transactions CTGAN v1",
    "model_type": "ctgan",
    "config": {"epochs": 300},
})
model_id = resp.json()["model_id"]

# 2. Wait for training to complete
while True:
    r = requests.get(f"{BASE_URL}/api/synthgen/models/{model_id}", headers=HEADERS)
    if r.json()["model"]["status"] in ("ready", "failed"):
        break
    time.sleep(15)

# 3. Generate synthetic data
resp = requests.post(
    f"{BASE_URL}/api/synthgen/models/{model_id}/generate",
    headers=HEADERS,
    json={"num_rows": 10000, "seed": 42},
)
job_id = resp.json()["synthgen_job_id"]

# 4. Wait for generation to complete
while True:
    r = requests.get(f"{BASE_URL}/api/synthgen/jobs/{job_id}", headers=HEADERS)
    if r.json()["status"] in ("completed", "failed"):
        break
    time.sleep(5)

# 5. Download the result
resp = requests.get(
    f"{BASE_URL}/api/synthgen/jobs/{job_id}/download",
    headers=HEADERS,
)
with open("synthetic_transactions.csv", "wb") as f:
    f.write(resp.content)
print(f"Generated {10000} synthetic rows.")