7. Privacy and SynthGen

Goal

Protect sensitive data with policy-driven transformations and create synthetic datasets for safe experimentation.

Privacy workflow

  1. Open Privacy.

  2. Create or select a privacy policy.

  3. Attach policy to dataset/session.

  4. Run detection and review discovered PII entities.

  5. Configure transformation action per rule (mask/redact/hash/encrypt/generalize/suppress).

  6. Execute transform and export protected dataset.

SynthGen workflow

  1. Open SynthGen.

  2. Select source dataset and generation profile.

  3. Configure sample size and optional constraints.

  4. Start generation job.

  5. Download synthetic dataset artifact on completion.

Data quality and parity checks

  1. Schema parity: - Column count and names are as expected.

  2. Privacy parity: - Original sensitive values are not exposed in transformed output.

  3. Utility parity: - Basic distribution trends remain usable for modeling intent.

  4. Integrity: - Output file is readable and row count matches configured generation target.

Functional validation checklist

  1. Privacy detection returns non-empty findings when PII exists.

  2. Transform run produces downloadable artifact.

  3. SynthGen run reaches terminal completed state.

  4. Generated dataset can be loaded into Datasets/Experiments modules.

  5. Error states surface actionable diagnostics.

Expected result

  1. Sensitive data is protected according to selected policy.

  2. Synthetic data can be used for prototyping and tests.

Common errors and recovery

  1. No PII findings when expected: - Verify selected policy/rules and source columns.

  2. Transform failure: - Retry with narrower rule scope and inspect logs.

  3. SynthGen timeout: - Reduce generation size and rerun.

Screenshots

Privacy policy and detection workflow

Privacy module with policy execution flow.

Synthetic data generation workflow

SynthGen training/generation and artifact output view.