Privacy Suite

The Privacy Suite provides PII detection, data masking, and compliance validation through the API. It detects 72+ PII types and supports HIPAA, GDPR, PCI-DSS, and CCPA compliance profiles.

Overview

A privacy workflow has four steps:

  1. Create a Policy – define compliance rules for PII handling

  2. Add Rules – specify how each PII type should be treated

  3. Create a Session – bind a policy to a dataset for scanning

  4. Detect & Transform – scan for PII, then mask/hash/redact


Step 1: Create a Privacy Policy

A policy defines the compliance profile and contains rules for each PII type.

resp = requests.post(f"{BASE_URL}/api/privacy/policies", headers=HEADERS, json={
    "project_id": project_id,
    "name": "HIPAA Compliance",
    "description": "PHI protection for patient data",
    "profile": "hipaa",
    "status": "active",
    "settings": {
        "auto_detect": True,
        "confidence_threshold": 0.8,
    },
})
policy_id = resp.json()["id"]

Available compliance profiles:

Profile

Description

hipaa

Protected Health Information (PHI) – medical records, SSNs, DOBs

gdpr

EU data protection – names, emails, IP addresses, location data

pci_dss

Payment card data – card numbers, CVVs, cardholder names

ccpa

California consumer privacy – broad PII categories

custom

User-defined rules only

Step 2: Add Detection Rules

Each rule specifies a PII type and the transformation action to apply when that type is detected.

rules = [
    {"pii_type": "email", "action": "hash", "is_enabled": True},
    {"pii_type": "phone_number", "action": "mask", "is_enabled": True},
    {"pii_type": "ssn", "action": "redact", "is_enabled": True},
    {"pii_type": "credit_card", "action": "mask", "config": {"show_last": 4}},
    {"pii_type": "name", "action": "generalize", "is_enabled": True},
    {"pii_type": "date_of_birth", "action": "generalize", "config": {"to": "year"}},
]

for rule in rules:
    requests.post(
        f"{BASE_URL}/api/privacy/policies/{policy_id}/rules",
        headers=HEADERS,
        json=rule,
    )

Available actions:

Action

Description

redact

Replace with [REDACTED]

mask

Partial masking (e.g., ***-**-1234). Use show_last config.

hash

One-way SHA-256 hash (preserves referential integrity)

generalize

Reduce precision (e.g., age → age range, date → year)

encrypt

AES-256 encryption (reversible with key)

tokenize

Replace with consistent random tokens

Listing Available PII Types

resp = requests.get(f"{BASE_URL}/api/privacy/pii-types", headers=HEADERS)
for t in resp.json()["types"][:10]:
    print(f"  {t['id']}: {t['name']}")

Step 3: Create a Scan Session

Bind a policy to a dataset for scanning:

resp = requests.post(f"{BASE_URL}/api/privacy/sessions", headers=HEADERS, json={
    "policy_id": policy_id,
    "dataset_id": dataset_id,
    "sample_size": 5000,       # scan a sample (None = full dataset)
    "deep_scan": True,         # use NLP-based detection
})
session_id = resp.json()["id"]

Step 4: Run Detection

Detection runs as a background job:

# Start detection
resp = requests.post(
    f"{BASE_URL}/api/privacy/sessions/{session_id}/detect",
    headers=HEADERS,
)
job_id = resp.json()["job_id"]

# Poll for results
import time
while True:
    r = requests.get(
        f"{BASE_URL}/api/privacy/sessions/{session_id}",
        headers=HEADERS,
    )
    session = r.json()["session"]
    if session["detection_status"] in ("completed", "failed"):
        break
    time.sleep(5)

# View detection results
resp = requests.get(
    f"{BASE_URL}/api/privacy/sessions/{session_id}/results",
    headers=HEADERS,
)
results = resp.json()
for col, detections in results.get("detections", {}).items():
    for d in detections:
        print(f"  Column '{col}': {d['pii_type']} (confidence: {d['confidence']:.2f})")

Step 5: Transform (Anonymize)

Apply the policy rules to mask/redact/hash detected PII:

resp = requests.post(
    f"{BASE_URL}/api/privacy/sessions/{session_id}/transform",
    headers=HEADERS,
)
job_id = resp.json()["job_id"]

# Wait for transformation
while True:
    r = requests.get(
        f"{BASE_URL}/api/privacy/sessions/{session_id}",
        headers=HEADERS,
    )
    session = r.json()["session"]
    if session["transform_status"] in ("completed", "failed"):
        break
    time.sleep(5)

Step 6: Download Anonymized Data

resp = requests.get(
    f"{BASE_URL}/api/privacy/sessions/{session_id}/download",
    headers=HEADERS,
)
with open("anonymized_data.csv", "wb") as f:
    f.write(resp.content)

Compliance Validation

Check if a dataset meets a specific compliance profile:

resp = requests.post(
    f"{BASE_URL}/api/privacy/sessions/{session_id}/validate",
    headers=HEADERS,
)
for check in resp.json()["results"]:
    status = "PASS" if check["passed"] else "FAIL"
    print(f"  [{status}] {check['rule']}: {check['message']}")

Export Privacy Report

Generate a PDF report of the privacy scan:

resp = requests.get(
    f"{BASE_URL}/api/privacy/sessions/{session_id}/report",
    headers=HEADERS,
    params={"ai_insights": True},  # optional AI suggestions
)
with open("privacy_report.pdf", "wb") as f:
    f.write(resp.content)

SDK Example

The same workflow using the Python SDK:

from coreplexml import CorePlexMLClient

client = CorePlexMLClient(base_url=BASE_URL, api_key=API_KEY)

# Create policy with rules
policy = client.privacy.create_policy(
    project_id=project_id,
    name="GDPR Compliance",
    profile="gdpr",
)
client.privacy.add_rule(policy["id"], pii_type="email", action="hash")
client.privacy.add_rule(policy["id"], pii_type="phone_number", action="mask")

# Scan
session = client.privacy.create_session(
    policy_id=policy["id"],
    dataset_id=dataset_id,
)
client.privacy.detect(session["id"])
client.privacy.transform(session["id"])

# Download result
client.privacy.download(session["id"], "anonymized.csv")

See also