==========================
Privacy Suite
==========================

The Privacy Suite provides PII detection, data masking, and compliance
validation through the API. It detects 72+ PII types and supports
HIPAA, GDPR, PCI-DSS, and CCPA compliance profiles.

.. contents:: Sections
   :local:
   :depth: 2

Overview
--------

A privacy workflow has four steps:

1. **Create a Policy** -- define compliance rules for PII handling
2. **Add Rules** -- specify how each PII type should be treated
3. **Create a Session** -- bind a policy to a dataset for scanning
4. **Detect & Transform** -- scan for PII, then mask/hash/redact

----

Step 1: Create a Privacy Policy
--------------------------------

A policy defines the compliance profile and contains rules for each
PII type.

.. code-block:: python

   resp = requests.post(f"{BASE_URL}/api/privacy/policies", headers=HEADERS, json={
       "project_id": project_id,
       "name": "HIPAA Compliance",
       "description": "PHI protection for patient data",
       "profile": "hipaa",
       "status": "active",
       "settings": {
           "auto_detect": True,
           "confidence_threshold": 0.8,
       },
   })
   policy_id = resp.json()["id"]

Available compliance profiles:

.. list-table::
   :header-rows: 1
   :widths: 20 80

   * - Profile
     - Description
   * - ``hipaa``
     - Protected Health Information (PHI) -- medical records, SSNs, DOBs
   * - ``gdpr``
     - EU data protection -- names, emails, IP addresses, location data
   * - ``pci_dss``
     - Payment card data -- card numbers, CVVs, cardholder names
   * - ``ccpa``
     - California consumer privacy -- broad PII categories
   * - ``custom``
     - User-defined rules only

Step 2: Add Detection Rules
----------------------------

Each rule specifies a PII type and the transformation action to apply
when that type is detected.

.. code-block:: python

   rules = [
       {"pii_type": "email", "action": "hash", "is_enabled": True},
       {"pii_type": "phone_number", "action": "mask", "is_enabled": True},
       {"pii_type": "ssn", "action": "redact", "is_enabled": True},
       {"pii_type": "credit_card", "action": "mask", "config": {"show_last": 4}},
       {"pii_type": "name", "action": "generalize", "is_enabled": True},
       {"pii_type": "date_of_birth", "action": "generalize", "config": {"to": "year"}},
   ]

   for rule in rules:
       requests.post(
           f"{BASE_URL}/api/privacy/policies/{policy_id}/rules",
           headers=HEADERS,
           json=rule,
       )

Available actions:

.. list-table::
   :header-rows: 1
   :widths: 20 80

   * - Action
     - Description
   * - ``redact``
     - Replace with ``[REDACTED]``
   * - ``mask``
     - Partial masking (e.g., ``***-**-1234``). Use ``show_last`` config.
   * - ``hash``
     - One-way SHA-256 hash (preserves referential integrity)
   * - ``generalize``
     - Reduce precision (e.g., age → age range, date → year)
   * - ``encrypt``
     - AES-256 encryption (reversible with key)
   * - ``tokenize``
     - Replace with consistent random tokens

Listing Available PII Types
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: python

   resp = requests.get(f"{BASE_URL}/api/privacy/pii-types", headers=HEADERS)
   for t in resp.json()["types"][:10]:
       print(f"  {t['id']}: {t['name']}")

Step 3: Create a Scan Session
------------------------------

Bind a policy to a dataset for scanning:

.. code-block:: python

   resp = requests.post(f"{BASE_URL}/api/privacy/sessions", headers=HEADERS, json={
       "policy_id": policy_id,
       "dataset_id": dataset_id,
       "sample_size": 5000,       # scan a sample (None = full dataset)
       "deep_scan": True,         # use NLP-based detection
   })
   session_id = resp.json()["id"]

Step 4: Run Detection
---------------------

Detection runs as a background job:

.. code-block:: python

   # Start detection
   resp = requests.post(
       f"{BASE_URL}/api/privacy/sessions/{session_id}/detect",
       headers=HEADERS,
   )
   job_id = resp.json()["job_id"]

   # Poll for results
   import time
   while True:
       r = requests.get(
           f"{BASE_URL}/api/privacy/sessions/{session_id}",
           headers=HEADERS,
       )
       session = r.json()["session"]
       if session["detection_status"] in ("completed", "failed"):
           break
       time.sleep(5)

   # View detection results
   resp = requests.get(
       f"{BASE_URL}/api/privacy/sessions/{session_id}/results",
       headers=HEADERS,
   )
   results = resp.json()
   for col, detections in results.get("detections", {}).items():
       for d in detections:
           print(f"  Column '{col}': {d['pii_type']} (confidence: {d['confidence']:.2f})")

Step 5: Transform (Anonymize)
-----------------------------

Apply the policy rules to mask/redact/hash detected PII:

.. code-block:: python

   resp = requests.post(
       f"{BASE_URL}/api/privacy/sessions/{session_id}/transform",
       headers=HEADERS,
   )
   job_id = resp.json()["job_id"]

   # Wait for transformation
   while True:
       r = requests.get(
           f"{BASE_URL}/api/privacy/sessions/{session_id}",
           headers=HEADERS,
       )
       session = r.json()["session"]
       if session["transform_status"] in ("completed", "failed"):
           break
       time.sleep(5)

Step 6: Download Anonymized Data
---------------------------------

.. code-block:: python

   resp = requests.get(
       f"{BASE_URL}/api/privacy/sessions/{session_id}/download",
       headers=HEADERS,
   )
   with open("anonymized_data.csv", "wb") as f:
       f.write(resp.content)

Compliance Validation
---------------------

Check if a dataset meets a specific compliance profile:

.. code-block:: python

   resp = requests.post(
       f"{BASE_URL}/api/privacy/sessions/{session_id}/validate",
       headers=HEADERS,
   )
   for check in resp.json()["results"]:
       status = "PASS" if check["passed"] else "FAIL"
       print(f"  [{status}] {check['rule']}: {check['message']}")

Export Privacy Report
---------------------

Generate a PDF report of the privacy scan:

.. code-block:: python

   resp = requests.get(
       f"{BASE_URL}/api/privacy/sessions/{session_id}/report",
       headers=HEADERS,
       params={"ai_insights": True},  # optional AI suggestions
   )
   with open("privacy_report.pdf", "wb") as f:
       f.write(resp.content)

SDK Example
-----------

The same workflow using the Python SDK:

.. code-block:: python

   from coreplexml import CorePlexMLClient

   client = CorePlexMLClient(base_url=BASE_URL, api_key=API_KEY)

   # Create policy with rules
   policy = client.privacy.create_policy(
       project_id=project_id,
       name="GDPR Compliance",
       profile="gdpr",
   )
   client.privacy.add_rule(policy["id"], pii_type="email", action="hash")
   client.privacy.add_rule(policy["id"], pii_type="phone_number", action="mask")

   # Scan
   session = client.privacy.create_session(
       policy_id=policy["id"],
       dataset_id=dataset_id,
   )
   client.privacy.detect(session["id"])
   client.privacy.transform(session["id"])

   # Download result
   client.privacy.download(session["id"], "anonymized.csv")

----

.. seealso::

   - :doc:`/api-reference/privacy` -- Full Privacy API reference
   - :doc:`synthetic-data` -- Generate synthetic data from anonymized datasets