SynthGen API
SynthGen generates synthetic tabular data that preserves the statistical properties of your original dataset. It supports four model architectures: CTGAN, CopulaGAN, TVAE, and Gaussian Copula.
All endpoints are prefixed with /api/synthgen.
Create Model
POST /api/synthgen/models
Create a new SynthGen model and enqueue a synthgen_train background
job. Training begins automatically after the job is picked up by the
worker.
Request Body
Field |
Type |
Required |
Description |
|---|---|---|---|
|
string |
Yes |
UUID of the project. |
|
string |
Yes |
UUID of the dataset version to train on. |
|
string |
Yes |
Model name. |
|
string |
No |
Model architecture: |
|
object |
No |
Architecture-specific hyperparameters (e.g. |
Example
curl -X POST "$BASE_URL/api/synthgen/models" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"project_id": "d4e5f6a7-b8c9-0123-def4-567890123456",
"dataset_version_id": "f6a7b8c9-d0e1-2345-6789-0abcdef12345",
"name": "Transactions CTGAN",
"model_type": "ctgan",
"config": {
"epochs": 300,
"batch_size": 500
}
}'
import requests
import time
resp = requests.post(f"{BASE_URL}/api/synthgen/models", headers={
"Authorization": "Bearer YOUR_API_KEY",
}, json={
"project_id": "d4e5f6a7-b8c9-0123-def4-567890123456",
"dataset_version_id": "f6a7b8c9-d0e1-2345-6789-0abcdef12345",
"name": "Transactions CTGAN",
"model_type": "ctgan",
"config": {"epochs": 300, "batch_size": 500},
})
model_id = resp.json()["model_id"]
print("Model training started:", model_id)
# Poll until ready
while True:
r = requests.get(f"{BASE_URL}/api/synthgen/models/{model_id}", headers={
"Authorization": "Bearer YOUR_API_KEY",
})
status = r.json()["model"]["status"]
print(f" Status: {status}")
if status in ("ready", "failed"):
break
time.sleep(10)
Response 201 Created
{
"id": "5e6f7a8b-9c0d-1234-ef56-789012345678",
"model_id": "5e6f7a8b-9c0d-1234-ef56-789012345678",
"job_id": "6f7a8b9c-0d1e-2345-f678-901234567890",
"status": "pending"
}
Model Configuration Reference
Each model type supports specific configuration options passed in the
config object:
CTGAN (Conditional Tabular GAN)
Option |
Type |
Default |
Description |
|---|---|---|---|
|
integer |
300 |
Number of training epochs. |
|
integer |
500 |
Training batch size. |
|
integer |
128 |
Dimensionality of data embeddings. |
|
array[int] |
[256, 256] |
Hidden layer sizes for the generator network. |
|
array[int] |
[256, 256] |
Hidden layer sizes for the discriminator network. |
|
integer |
10 |
PAC (Packing) size. Must evenly divide |
CopulaGAN
Same options as CTGAN. Additionally captures complex column correlations using copula functions.
TVAE (Tabular VAE)
Option |
Type |
Default |
Description |
|---|---|---|---|
|
integer |
300 |
Number of training epochs. |
|
integer |
500 |
Training batch size. |
|
integer |
128 |
Dimensionality of data embeddings. |
Gaussian Copula
Fastest model type. No deep learning – uses statistical copula fitting. Minimal configuration required:
Option |
Type |
Default |
Description |
|---|---|---|---|
|
integer |
None |
Random seed for reproducibility. |
Model Type Comparison
Type |
Speed |
Quality |
Best For |
Limitations |
|---|---|---|---|---|
|
Slow |
High |
Mixed column types, general use |
Requires GPU for large datasets |
|
Slow |
High |
Complex correlations |
Higher memory usage |
|
Medium |
Medium-High |
Large datasets, fast iteration |
May miss complex correlations |
|
Fast |
Medium |
Quick prototyping, baselines |
Assumes Gaussian distributions |
List Models
GET /api/synthgen/models
Query Parameters
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
string |
– |
Filter by project. |
|
integer |
50 |
Max items. |
|
integer |
0 |
Pagination offset. |
Example
curl "$BASE_URL/api/synthgen/models?project_id=d4e5f6a7-b8c9-0123-def4-567890123456" \
-H "Authorization: Bearer YOUR_API_KEY"
Response 200 OK
{
"items": [
{
"id": "5e6f7a8b-9c0d-1234-ef56-789012345678",
"name": "Transactions CTGAN",
"model_type": "ctgan",
"status": "ready",
"dataset_name": "Transactions Q4",
"dataset_version": 0,
"created_at": "2026-02-25T10:00:00Z"
}
],
"total": 1,
"limit": 50,
"offset": 0
}
Get Model Detail
GET /api/synthgen/models/{model_id}
Return model metadata and its associated training/generation jobs.
Example
curl "$BASE_URL/api/synthgen/models/5e6f7a8b-9c0d-1234-ef56-789012345678" \
-H "Authorization: Bearer YOUR_API_KEY"
Response 200 OK
{
"model": {
"id": "5e6f7a8b-9c0d-1234-ef56-789012345678",
"name": "Transactions CTGAN",
"model_type": "ctgan",
"status": "ready",
"config": {"epochs": 300, "batch_size": 500},
"dataset_name": "Transactions Q4",
"dataset_version": 0,
"dataset_id": "e5f6a7b8-c9d0-1234-5678-90abcdef1234",
"created_at": "2026-02-25T10:00:00Z"
},
"jobs": [
{
"id": "7a8b9c0d-1e2f-3456-7890-123456789012",
"job_type": "train",
"status": "completed",
"created_at": "2026-02-25T10:00:00Z"
}
]
}
Generate Synthetic Data
POST /api/synthgen/models/{model_id}/generate
Generate synthetic data from a trained model. Enqueues a
synthgen_generate background job. The model must be in ready
status.
Request Body
Field |
Type |
Required |
Description |
|---|---|---|---|
|
integer |
No |
Number of synthetic rows to generate (default 1000). |
|
integer |
No |
Random seed for reproducibility. |
|
object |
No |
Conditional generation constraints (column-value pairs). |
Example
curl -X POST "$BASE_URL/api/synthgen/models/5e6f7a8b-9c0d-1234-ef56-789012345678/generate" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"num_rows": 10000,
"seed": 42
}'
# Generate synthetic data
resp = requests.post(
f"{BASE_URL}/api/synthgen/models/5e6f7a8b-9c0d-1234-ef56-789012345678/generate",
headers={"Authorization": "Bearer YOUR_API_KEY"},
json={"num_rows": 10000, "seed": 42},
)
synthgen_job_id = resp.json()["synthgen_job_id"]
# Poll until generation completes
while True:
r = requests.get(
f"{BASE_URL}/api/synthgen/jobs/{synthgen_job_id}",
headers={"Authorization": "Bearer YOUR_API_KEY"},
)
status = r.json()["status"]
if status in ("completed", "failed"):
break
time.sleep(5)
# Download the generated data
dl = requests.get(
f"{BASE_URL}/api/synthgen/jobs/{synthgen_job_id}/download",
headers={"Authorization": "Bearer YOUR_API_KEY"},
)
with open("synthetic_data.csv", "wb") as f:
f.write(dl.content)
Response 200 OK
{
"job_id": "8b9c0d1e-2f3a-4567-8901-234567890123",
"synthgen_job_id": "9c0d1e2f-3a4b-5678-9012-345678901234",
"status": "pending"
}
Delete Model
DELETE /api/synthgen/models/{model_id}
Permanently delete a SynthGen model and its associated jobs.
Example
curl -X DELETE "$BASE_URL/api/synthgen/models/5e6f7a8b-9c0d-1234-ef56-789012345678" \
-H "Authorization: Bearer YOUR_API_KEY"
Response 200 OK
{
"ok": true,
"model_id": "5e6f7a8b-9c0d-1234-ef56-789012345678"
}
List Jobs
GET /api/synthgen/jobs
Return SynthGen jobs (training and generation) with optional filters.
Query Parameters
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
string |
– |
Filter by project. |
|
string |
– |
Filter by SynthGen model. |
|
string |
– |
Filter by status: |
|
integer |
50 |
Max items. |
|
integer |
0 |
Pagination offset. |
Example
curl "$BASE_URL/api/synthgen/jobs?model_id=5e6f7a8b-9c0d-1234-ef56-789012345678" \
-H "Authorization: Bearer YOUR_API_KEY"
Response 200 OK
{
"items": [
{
"id": "9c0d1e2f-3a4b-5678-9012-345678901234",
"model_id": "5e6f7a8b-9c0d-1234-ef56-789012345678",
"model_name": "Transactions CTGAN",
"model_type": "ctgan",
"job_type": "generate",
"status": "completed",
"config": {"num_rows": 10000, "seed": 42},
"created_at": "2026-02-25T11:00:00Z"
}
],
"total": 1,
"limit": 50,
"offset": 0
}
Get Job Detail
GET /api/synthgen/jobs/{job_id}
Return details for a specific SynthGen job.
Example
curl "$BASE_URL/api/synthgen/jobs/9c0d1e2f-3a4b-5678-9012-345678901234" \
-H "Authorization: Bearer YOUR_API_KEY"
Response 200 OK
{
"id": "9c0d1e2f-3a4b-5678-9012-345678901234",
"model_id": "5e6f7a8b-9c0d-1234-ef56-789012345678",
"model_name": "Transactions CTGAN",
"model_type": "ctgan",
"project_name": "Fraud Detection v2",
"job_type": "generate",
"status": "completed",
"config": {"num_rows": 10000, "seed": 42},
"output_artifact_id": "0d1e2f3a-4b5c-6789-0123-456789012345",
"created_at": "2026-02-25T11:00:00Z"
}
Download Synthetic Data
GET /api/synthgen/jobs/{job_id}/download
Download the output artifact from a completed generation job.
Returns 409 Conflict if the job output is not available yet.
Example
curl -o synthetic_data.csv \
"$BASE_URL/api/synthgen/jobs/9c0d1e2f-3a4b-5678-9012-345678901234/download" \
-H "Authorization: Bearer YOUR_API_KEY"
resp = requests.get(
f"{BASE_URL}/api/synthgen/jobs/9c0d1e2f-3a4b-5678-9012-345678901234/download",
headers={"Authorization": "Bearer YOUR_API_KEY"},
)
with open("synthetic_data.csv", "wb") as f:
f.write(resp.content)
Response 200 OK
Binary file download.
List Model Types
GET /api/synthgen/model-types
Return available model architectures and their descriptions.
Example
curl "$BASE_URL/api/synthgen/model-types" \
-H "Authorization: Bearer YOUR_API_KEY"
Response 200 OK
{
"model_types": [
{
"id": "ctgan",
"name": "CTGAN",
"description": "Conditional Tabular GAN - best for general tabular data"
},
{
"id": "copulagan",
"name": "CopulaGAN",
"description": "Copula-based GAN - good for capturing correlations"
},
{
"id": "tvae",
"name": "TVAE",
"description": "Tabular VAE - faster training, good for large datasets"
},
{
"id": "gaussian_copula",
"name": "Gaussian Copula",
"description": "Statistical copula model - fastest, good baseline"
}
]
}
Full Workflow Example
Train a CTGAN model and generate 10,000 synthetic rows:
import requests
import time
BASE_URL = "http://localhost:8888"
HEADERS = {"Authorization": "Bearer YOUR_API_KEY"}
# 1. Train a CTGAN model
resp = requests.post(f"{BASE_URL}/api/synthgen/models", headers=HEADERS, json={
"project_id": "d4e5f6a7-b8c9-0123-def4-567890123456",
"dataset_version_id": "f6a7b8c9-d0e1-2345-6789-0abcdef12345",
"name": "Transactions CTGAN v1",
"model_type": "ctgan",
"config": {"epochs": 300},
})
model_id = resp.json()["model_id"]
# 2. Wait for training to complete
while True:
r = requests.get(f"{BASE_URL}/api/synthgen/models/{model_id}", headers=HEADERS)
if r.json()["model"]["status"] in ("ready", "failed"):
break
time.sleep(15)
# 3. Generate synthetic data
resp = requests.post(
f"{BASE_URL}/api/synthgen/models/{model_id}/generate",
headers=HEADERS,
json={"num_rows": 10000, "seed": 42},
)
job_id = resp.json()["synthgen_job_id"]
# 4. Wait for generation to complete
while True:
r = requests.get(f"{BASE_URL}/api/synthgen/jobs/{job_id}", headers=HEADERS)
if r.json()["status"] in ("completed", "failed"):
break
time.sleep(5)
# 5. Download the result
resp = requests.get(
f"{BASE_URL}/api/synthgen/jobs/{job_id}/download",
headers=HEADERS,
)
with open("synthetic_transactions.csv", "wb") as f:
f.write(resp.content)
print(f"Generated {10000} synthetic rows.")
See also
Datasets API – Uploading real datasets for training.
Reports API – Generating SynthGen quality reports.