SynthGen API

SynthGen generates synthetic tabular data that preserves the statistical properties of your original dataset. It supports four model architectures: CTGAN, CopulaGAN, TVAE, and Gaussian Copula.

All endpoints are prefixed with /api/synthgen.


Create Model

POST /api/synthgen/models

Create a new SynthGen model and enqueue a synthgen_train background job. Training begins automatically after the job is picked up by the worker.

Request Body

Field

Type

Required

Description

project_id

string

Yes

UUID of the project.

dataset_version_id

string

Yes

UUID of the dataset version to train on.

name

string

Yes

Model name.

model_type

string

No

Model architecture: ctgan (default), copulagan, tvae, or gaussian_copula.

config

object

No

Architecture-specific hyperparameters (e.g. epochs, batch_size, embedding_dim).

Example

curl -X POST "$BASE_URL/api/synthgen/models" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "project_id": "d4e5f6a7-b8c9-0123-def4-567890123456",
    "dataset_version_id": "f6a7b8c9-d0e1-2345-6789-0abcdef12345",
    "name": "Transactions CTGAN",
    "model_type": "ctgan",
    "config": {
      "epochs": 300,
      "batch_size": 500
    }
  }'
import requests
import time

resp = requests.post(f"{BASE_URL}/api/synthgen/models", headers={
    "Authorization": "Bearer YOUR_API_KEY",
}, json={
    "project_id": "d4e5f6a7-b8c9-0123-def4-567890123456",
    "dataset_version_id": "f6a7b8c9-d0e1-2345-6789-0abcdef12345",
    "name": "Transactions CTGAN",
    "model_type": "ctgan",
    "config": {"epochs": 300, "batch_size": 500},
})
model_id = resp.json()["model_id"]
print("Model training started:", model_id)

# Poll until ready
while True:
    r = requests.get(f"{BASE_URL}/api/synthgen/models/{model_id}", headers={
        "Authorization": "Bearer YOUR_API_KEY",
    })
    status = r.json()["model"]["status"]
    print(f"  Status: {status}")
    if status in ("ready", "failed"):
        break
    time.sleep(10)

Response 201 Created

{
  "id": "5e6f7a8b-9c0d-1234-ef56-789012345678",
  "model_id": "5e6f7a8b-9c0d-1234-ef56-789012345678",
  "job_id": "6f7a8b9c-0d1e-2345-f678-901234567890",
  "status": "pending"
}

Model Configuration Reference

Each model type supports specific configuration options passed in the config object:

CTGAN (Conditional Tabular GAN)

Option

Type

Default

Description

epochs

integer

300

Number of training epochs.

batch_size

integer

500

Training batch size.

embedding_dim

integer

128

Dimensionality of data embeddings.

generator_dim

array[int]

[256, 256]

Hidden layer sizes for the generator network.

discriminator_dim

array[int]

[256, 256]

Hidden layer sizes for the discriminator network.

pac

integer

10

PAC (Packing) size. Must evenly divide batch_size.

CopulaGAN

Same options as CTGAN. Additionally captures complex column correlations using copula functions.

TVAE (Tabular VAE)

Option

Type

Default

Description

epochs

integer

300

Number of training epochs.

batch_size

integer

500

Training batch size.

embedding_dim

integer

128

Dimensionality of data embeddings.

Gaussian Copula

Fastest model type. No deep learning – uses statistical copula fitting. Minimal configuration required:

Option

Type

Default

Description

seed

integer

None

Random seed for reproducibility.

Model Type Comparison

Type

Speed

Quality

Best For

Limitations

ctgan

Slow

High

Mixed column types, general use

Requires GPU for large datasets

copulagan

Slow

High

Complex correlations

Higher memory usage

tvae

Medium

Medium-High

Large datasets, fast iteration

May miss complex correlations

gaussian_copula

Fast

Medium

Quick prototyping, baselines

Assumes Gaussian distributions


List Models

GET /api/synthgen/models

Query Parameters

Parameter

Type

Default

Description

project_id

string

Filter by project.

limit

integer

50

Max items.

offset

integer

0

Pagination offset.

Example

curl "$BASE_URL/api/synthgen/models?project_id=d4e5f6a7-b8c9-0123-def4-567890123456" \
  -H "Authorization: Bearer YOUR_API_KEY"

Response 200 OK

{
  "items": [
    {
      "id": "5e6f7a8b-9c0d-1234-ef56-789012345678",
      "name": "Transactions CTGAN",
      "model_type": "ctgan",
      "status": "ready",
      "dataset_name": "Transactions Q4",
      "dataset_version": 0,
      "created_at": "2026-02-25T10:00:00Z"
    }
  ],
  "total": 1,
  "limit": 50,
  "offset": 0
}

Get Model Detail

GET /api/synthgen/models/{model_id}

Return model metadata and its associated training/generation jobs.

Example

curl "$BASE_URL/api/synthgen/models/5e6f7a8b-9c0d-1234-ef56-789012345678" \
  -H "Authorization: Bearer YOUR_API_KEY"

Response 200 OK

{
  "model": {
    "id": "5e6f7a8b-9c0d-1234-ef56-789012345678",
    "name": "Transactions CTGAN",
    "model_type": "ctgan",
    "status": "ready",
    "config": {"epochs": 300, "batch_size": 500},
    "dataset_name": "Transactions Q4",
    "dataset_version": 0,
    "dataset_id": "e5f6a7b8-c9d0-1234-5678-90abcdef1234",
    "created_at": "2026-02-25T10:00:00Z"
  },
  "jobs": [
    {
      "id": "7a8b9c0d-1e2f-3456-7890-123456789012",
      "job_type": "train",
      "status": "completed",
      "created_at": "2026-02-25T10:00:00Z"
    }
  ]
}

Generate Synthetic Data

POST /api/synthgen/models/{model_id}/generate

Generate synthetic data from a trained model. Enqueues a synthgen_generate background job. The model must be in ready status.

Request Body

Field

Type

Required

Description

num_rows

integer

No

Number of synthetic rows to generate (default 1000).

seed

integer

No

Random seed for reproducibility.

conditions

object

No

Conditional generation constraints (column-value pairs).

Example

curl -X POST "$BASE_URL/api/synthgen/models/5e6f7a8b-9c0d-1234-ef56-789012345678/generate" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "num_rows": 10000,
    "seed": 42
  }'
# Generate synthetic data
resp = requests.post(
    f"{BASE_URL}/api/synthgen/models/5e6f7a8b-9c0d-1234-ef56-789012345678/generate",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    json={"num_rows": 10000, "seed": 42},
)
synthgen_job_id = resp.json()["synthgen_job_id"]

# Poll until generation completes
while True:
    r = requests.get(
        f"{BASE_URL}/api/synthgen/jobs/{synthgen_job_id}",
        headers={"Authorization": "Bearer YOUR_API_KEY"},
    )
    status = r.json()["status"]
    if status in ("completed", "failed"):
        break
    time.sleep(5)

# Download the generated data
dl = requests.get(
    f"{BASE_URL}/api/synthgen/jobs/{synthgen_job_id}/download",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
)
with open("synthetic_data.csv", "wb") as f:
    f.write(dl.content)

Response 200 OK

{
  "job_id": "8b9c0d1e-2f3a-4567-8901-234567890123",
  "synthgen_job_id": "9c0d1e2f-3a4b-5678-9012-345678901234",
  "status": "pending"
}

Delete Model

DELETE /api/synthgen/models/{model_id}

Permanently delete a SynthGen model and its associated jobs.

Example

curl -X DELETE "$BASE_URL/api/synthgen/models/5e6f7a8b-9c0d-1234-ef56-789012345678" \
  -H "Authorization: Bearer YOUR_API_KEY"

Response 200 OK

{
  "ok": true,
  "model_id": "5e6f7a8b-9c0d-1234-ef56-789012345678"
}

List Jobs

GET /api/synthgen/jobs

Return SynthGen jobs (training and generation) with optional filters.

Query Parameters

Parameter

Type

Default

Description

project_id

string

Filter by project.

model_id

string

Filter by SynthGen model.

status

string

Filter by status: pending, running, completed, failed.

limit

integer

50

Max items.

offset

integer

0

Pagination offset.

Example

curl "$BASE_URL/api/synthgen/jobs?model_id=5e6f7a8b-9c0d-1234-ef56-789012345678" \
  -H "Authorization: Bearer YOUR_API_KEY"

Response 200 OK

{
  "items": [
    {
      "id": "9c0d1e2f-3a4b-5678-9012-345678901234",
      "model_id": "5e6f7a8b-9c0d-1234-ef56-789012345678",
      "model_name": "Transactions CTGAN",
      "model_type": "ctgan",
      "job_type": "generate",
      "status": "completed",
      "config": {"num_rows": 10000, "seed": 42},
      "created_at": "2026-02-25T11:00:00Z"
    }
  ],
  "total": 1,
  "limit": 50,
  "offset": 0
}

Get Job Detail

GET /api/synthgen/jobs/{job_id}

Return details for a specific SynthGen job.

Example

curl "$BASE_URL/api/synthgen/jobs/9c0d1e2f-3a4b-5678-9012-345678901234" \
  -H "Authorization: Bearer YOUR_API_KEY"

Response 200 OK

{
  "id": "9c0d1e2f-3a4b-5678-9012-345678901234",
  "model_id": "5e6f7a8b-9c0d-1234-ef56-789012345678",
  "model_name": "Transactions CTGAN",
  "model_type": "ctgan",
  "project_name": "Fraud Detection v2",
  "job_type": "generate",
  "status": "completed",
  "config": {"num_rows": 10000, "seed": 42},
  "output_artifact_id": "0d1e2f3a-4b5c-6789-0123-456789012345",
  "created_at": "2026-02-25T11:00:00Z"
}

Download Synthetic Data

GET /api/synthgen/jobs/{job_id}/download

Download the output artifact from a completed generation job.

Returns 409 Conflict if the job output is not available yet.

Example

curl -o synthetic_data.csv \
  "$BASE_URL/api/synthgen/jobs/9c0d1e2f-3a4b-5678-9012-345678901234/download" \
  -H "Authorization: Bearer YOUR_API_KEY"
resp = requests.get(
    f"{BASE_URL}/api/synthgen/jobs/9c0d1e2f-3a4b-5678-9012-345678901234/download",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
)
with open("synthetic_data.csv", "wb") as f:
    f.write(resp.content)

Response 200 OK

Binary file download.


List Model Types

GET /api/synthgen/model-types

Return available model architectures and their descriptions.

Example

curl "$BASE_URL/api/synthgen/model-types" \
  -H "Authorization: Bearer YOUR_API_KEY"

Response 200 OK

{
  "model_types": [
    {
      "id": "ctgan",
      "name": "CTGAN",
      "description": "Conditional Tabular GAN - best for general tabular data"
    },
    {
      "id": "copulagan",
      "name": "CopulaGAN",
      "description": "Copula-based GAN - good for capturing correlations"
    },
    {
      "id": "tvae",
      "name": "TVAE",
      "description": "Tabular VAE - faster training, good for large datasets"
    },
    {
      "id": "gaussian_copula",
      "name": "Gaussian Copula",
      "description": "Statistical copula model - fastest, good baseline"
    }
  ]
}

Full Workflow Example

Train a CTGAN model and generate 10,000 synthetic rows:

import requests
import time

BASE_URL = "http://localhost:8888"
HEADERS = {"Authorization": "Bearer YOUR_API_KEY"}

# 1. Train a CTGAN model
resp = requests.post(f"{BASE_URL}/api/synthgen/models", headers=HEADERS, json={
    "project_id": "d4e5f6a7-b8c9-0123-def4-567890123456",
    "dataset_version_id": "f6a7b8c9-d0e1-2345-6789-0abcdef12345",
    "name": "Transactions CTGAN v1",
    "model_type": "ctgan",
    "config": {"epochs": 300},
})
model_id = resp.json()["model_id"]

# 2. Wait for training to complete
while True:
    r = requests.get(f"{BASE_URL}/api/synthgen/models/{model_id}", headers=HEADERS)
    if r.json()["model"]["status"] in ("ready", "failed"):
        break
    time.sleep(15)

# 3. Generate synthetic data
resp = requests.post(
    f"{BASE_URL}/api/synthgen/models/{model_id}/generate",
    headers=HEADERS,
    json={"num_rows": 10000, "seed": 42},
)
job_id = resp.json()["synthgen_job_id"]

# 4. Wait for generation to complete
while True:
    r = requests.get(f"{BASE_URL}/api/synthgen/jobs/{job_id}", headers=HEADERS)
    if r.json()["status"] in ("completed", "failed"):
        break
    time.sleep(5)

# 5. Download the result
resp = requests.get(
    f"{BASE_URL}/api/synthgen/jobs/{job_id}/download",
    headers=HEADERS,
)
with open("synthetic_transactions.csv", "wb") as f:
    f.write(resp.content)
print(f"Generated {10000} synthetic rows.")

See also