========================== Privacy Suite ========================== The Privacy Suite provides PII detection, data masking, and compliance validation through the API. It detects 72+ PII types and supports HIPAA, GDPR, PCI-DSS, and CCPA compliance profiles. .. contents:: Sections :local: :depth: 2 Overview -------- A privacy workflow has four steps: 1. **Create a Policy** -- define compliance rules for PII handling 2. **Add Rules** -- specify how each PII type should be treated 3. **Create a Session** -- bind a policy to a dataset for scanning 4. **Detect & Transform** -- scan for PII, then mask/hash/redact ---- Step 1: Create a Privacy Policy -------------------------------- A policy defines the compliance profile and contains rules for each PII type. .. code-block:: python resp = requests.post(f"{BASE_URL}/api/privacy/policies", headers=HEADERS, json={ "project_id": project_id, "name": "HIPAA Compliance", "description": "PHI protection for patient data", "profile": "hipaa", "status": "active", "settings": { "auto_detect": True, "confidence_threshold": 0.8, }, }) policy_id = resp.json()["id"] Available compliance profiles: .. list-table:: :header-rows: 1 :widths: 20 80 * - Profile - Description * - ``hipaa`` - Protected Health Information (PHI) -- medical records, SSNs, DOBs * - ``gdpr`` - EU data protection -- names, emails, IP addresses, location data * - ``pci_dss`` - Payment card data -- card numbers, CVVs, cardholder names * - ``ccpa`` - California consumer privacy -- broad PII categories * - ``custom`` - User-defined rules only Step 2: Add Detection Rules ---------------------------- Each rule specifies a PII type and the transformation action to apply when that type is detected. .. code-block:: python rules = [ {"pii_type": "email", "action": "hash", "is_enabled": True}, {"pii_type": "phone_number", "action": "mask", "is_enabled": True}, {"pii_type": "ssn", "action": "redact", "is_enabled": True}, {"pii_type": "credit_card", "action": "mask", "config": {"show_last": 4}}, {"pii_type": "name", "action": "generalize", "is_enabled": True}, {"pii_type": "date_of_birth", "action": "generalize", "config": {"to": "year"}}, ] for rule in rules: requests.post( f"{BASE_URL}/api/privacy/policies/{policy_id}/rules", headers=HEADERS, json=rule, ) Available actions: .. list-table:: :header-rows: 1 :widths: 20 80 * - Action - Description * - ``redact`` - Replace with ``[REDACTED]`` * - ``mask`` - Partial masking (e.g., ``***-**-1234``). Use ``show_last`` config. * - ``hash`` - One-way SHA-256 hash (preserves referential integrity) * - ``generalize`` - Reduce precision (e.g., age → age range, date → year) * - ``encrypt`` - AES-256 encryption (reversible with key) * - ``tokenize`` - Replace with consistent random tokens Listing Available PII Types ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: python resp = requests.get(f"{BASE_URL}/api/privacy/pii-types", headers=HEADERS) for t in resp.json()["types"][:10]: print(f" {t['id']}: {t['name']}") Step 3: Create a Scan Session ------------------------------ Bind a policy to a dataset for scanning: .. code-block:: python resp = requests.post(f"{BASE_URL}/api/privacy/sessions", headers=HEADERS, json={ "policy_id": policy_id, "dataset_id": dataset_id, "sample_size": 5000, # scan a sample (None = full dataset) "deep_scan": True, # use NLP-based detection }) session_id = resp.json()["id"] Step 4: Run Detection --------------------- Detection runs as a background job: .. code-block:: python # Start detection resp = requests.post( f"{BASE_URL}/api/privacy/sessions/{session_id}/detect", headers=HEADERS, ) job_id = resp.json()["job_id"] # Poll for results import time while True: r = requests.get( f"{BASE_URL}/api/privacy/sessions/{session_id}", headers=HEADERS, ) session = r.json()["session"] if session["detection_status"] in ("completed", "failed"): break time.sleep(5) # View detection results resp = requests.get( f"{BASE_URL}/api/privacy/sessions/{session_id}/results", headers=HEADERS, ) results = resp.json() for col, detections in results.get("detections", {}).items(): for d in detections: print(f" Column '{col}': {d['pii_type']} (confidence: {d['confidence']:.2f})") Step 5: Transform (Anonymize) ----------------------------- Apply the policy rules to mask/redact/hash detected PII: .. code-block:: python resp = requests.post( f"{BASE_URL}/api/privacy/sessions/{session_id}/transform", headers=HEADERS, ) job_id = resp.json()["job_id"] # Wait for transformation while True: r = requests.get( f"{BASE_URL}/api/privacy/sessions/{session_id}", headers=HEADERS, ) session = r.json()["session"] if session["transform_status"] in ("completed", "failed"): break time.sleep(5) Step 6: Download Anonymized Data --------------------------------- .. code-block:: python resp = requests.get( f"{BASE_URL}/api/privacy/sessions/{session_id}/download", headers=HEADERS, ) with open("anonymized_data.csv", "wb") as f: f.write(resp.content) Compliance Validation --------------------- Check if a dataset meets a specific compliance profile: .. code-block:: python resp = requests.post( f"{BASE_URL}/api/privacy/sessions/{session_id}/validate", headers=HEADERS, ) for check in resp.json()["results"]: status = "PASS" if check["passed"] else "FAIL" print(f" [{status}] {check['rule']}: {check['message']}") Export Privacy Report --------------------- Generate a PDF report of the privacy scan: .. code-block:: python resp = requests.get( f"{BASE_URL}/api/privacy/sessions/{session_id}/report", headers=HEADERS, params={"ai_insights": True}, # optional AI suggestions ) with open("privacy_report.pdf", "wb") as f: f.write(resp.content) SDK Example ----------- The same workflow using the Python SDK: .. code-block:: python from coreplexml import CorePlexMLClient client = CorePlexMLClient(base_url=BASE_URL, api_key=API_KEY) # Create policy with rules policy = client.privacy.create_policy( project_id=project_id, name="GDPR Compliance", profile="gdpr", ) client.privacy.add_rule(policy["id"], pii_type="email", action="hash") client.privacy.add_rule(policy["id"], pii_type="phone_number", action="mask") # Scan session = client.privacy.create_session( policy_id=policy["id"], dataset_id=dataset_id, ) client.privacy.detect(session["id"]) client.privacy.transform(session["id"]) # Download result client.privacy.download(session["id"], "anonymized.csv") ---- .. seealso:: - :doc:`/api-reference/privacy` -- Full Privacy API reference - :doc:`synthetic-data` -- Generate synthetic data from anonymized datasets