7. Privacy and SynthGen

Goal

Protect sensitive data with policy-driven transformations and create synthetic datasets for safe experimentation.

Privacy workflow

Open Privacy.
Create or select a privacy policy.
Attach policy to dataset/session.
Run detection and review discovered PII entities.
Configure transformation action per rule (mask/redact/hash/encrypt/generalize/suppress).
Execute transform and export protected dataset.

SynthGen workflow

Open SynthGen.
Select source dataset and generation profile.
Configure sample size and optional constraints.
Start generation job.
Download synthetic dataset artifact on completion.

Data quality and parity checks

Schema parity: - Column count and names are as expected.
Privacy parity: - Original sensitive values are not exposed in transformed output.
Utility parity: - Basic distribution trends remain usable for modeling intent.
Integrity: - Output file is readable and row count matches configured generation target.

Functional validation checklist

Privacy detection returns non-empty findings when PII exists.
Transform run produces downloadable artifact.
SynthGen run reaches terminal completed state.
Generated dataset can be loaded into Datasets/Experiments modules.
Error states surface actionable diagnostics.

Expected result

Sensitive data is protected according to selected policy.
Synthetic data can be used for prototyping and tests.

Common errors and recovery

No PII findings when expected: - Verify selected policy/rules and source columns.
Transform failure: - Retry with narrower rule scope and inspect logs.
SynthGen timeout: - Reduce generation size and rerun.

Screenshots

Privacy policy and detection workflow — Privacy module with policy execution flow.

Synthetic data generation workflow — SynthGen training/generation and artifact output view.