Privacy Suite
The Privacy Suite provides PII detection, data masking, and compliance validation through the API. It detects 72+ PII types and supports HIPAA, GDPR, PCI-DSS, and CCPA compliance profiles.
Overview
A privacy workflow has four steps:
Create a Policy – define compliance rules for PII handling
Add Rules – specify how each PII type should be treated
Create a Session – bind a policy to a dataset for scanning
Detect & Transform – scan for PII, then mask/hash/redact
Step 1: Create a Privacy Policy
A policy defines the compliance profile and contains rules for each PII type.
resp = requests.post(f"{BASE_URL}/api/privacy/policies", headers=HEADERS, json={
"project_id": project_id,
"name": "HIPAA Compliance",
"description": "PHI protection for patient data",
"profile": "hipaa",
"status": "active",
"settings": {
"auto_detect": True,
"confidence_threshold": 0.8,
},
})
policy_id = resp.json()["id"]
Available compliance profiles:
Profile |
Description |
|---|---|
|
Protected Health Information (PHI) – medical records, SSNs, DOBs |
|
EU data protection – names, emails, IP addresses, location data |
|
Payment card data – card numbers, CVVs, cardholder names |
|
California consumer privacy – broad PII categories |
|
User-defined rules only |
Step 2: Add Detection Rules
Each rule specifies a PII type and the transformation action to apply when that type is detected.
rules = [
{"pii_type": "email", "action": "hash", "is_enabled": True},
{"pii_type": "phone_number", "action": "mask", "is_enabled": True},
{"pii_type": "ssn", "action": "redact", "is_enabled": True},
{"pii_type": "credit_card", "action": "mask", "config": {"show_last": 4}},
{"pii_type": "name", "action": "generalize", "is_enabled": True},
{"pii_type": "date_of_birth", "action": "generalize", "config": {"to": "year"}},
]
for rule in rules:
requests.post(
f"{BASE_URL}/api/privacy/policies/{policy_id}/rules",
headers=HEADERS,
json=rule,
)
Available actions:
Action |
Description |
|---|---|
|
Replace with |
|
Partial masking (e.g., |
|
One-way SHA-256 hash (preserves referential integrity) |
|
Reduce precision (e.g., age → age range, date → year) |
|
AES-256 encryption (reversible with key) |
|
Replace with consistent random tokens |
Listing Available PII Types
resp = requests.get(f"{BASE_URL}/api/privacy/pii-types", headers=HEADERS)
for t in resp.json()["types"][:10]:
print(f" {t['id']}: {t['name']}")
Step 3: Create a Scan Session
Bind a policy to a dataset for scanning:
resp = requests.post(f"{BASE_URL}/api/privacy/sessions", headers=HEADERS, json={
"policy_id": policy_id,
"dataset_id": dataset_id,
"sample_size": 5000, # scan a sample (None = full dataset)
"deep_scan": True, # use NLP-based detection
})
session_id = resp.json()["id"]
Step 4: Run Detection
Detection runs as a background job:
# Start detection
resp = requests.post(
f"{BASE_URL}/api/privacy/sessions/{session_id}/detect",
headers=HEADERS,
)
job_id = resp.json()["job_id"]
# Poll for results
import time
while True:
r = requests.get(
f"{BASE_URL}/api/privacy/sessions/{session_id}",
headers=HEADERS,
)
session = r.json()["session"]
if session["detection_status"] in ("completed", "failed"):
break
time.sleep(5)
# View detection results
resp = requests.get(
f"{BASE_URL}/api/privacy/sessions/{session_id}/results",
headers=HEADERS,
)
results = resp.json()
for col, detections in results.get("detections", {}).items():
for d in detections:
print(f" Column '{col}': {d['pii_type']} (confidence: {d['confidence']:.2f})")
Step 5: Transform (Anonymize)
Apply the policy rules to mask/redact/hash detected PII:
resp = requests.post(
f"{BASE_URL}/api/privacy/sessions/{session_id}/transform",
headers=HEADERS,
)
job_id = resp.json()["job_id"]
# Wait for transformation
while True:
r = requests.get(
f"{BASE_URL}/api/privacy/sessions/{session_id}",
headers=HEADERS,
)
session = r.json()["session"]
if session["transform_status"] in ("completed", "failed"):
break
time.sleep(5)
Step 6: Download Anonymized Data
resp = requests.get(
f"{BASE_URL}/api/privacy/sessions/{session_id}/download",
headers=HEADERS,
)
with open("anonymized_data.csv", "wb") as f:
f.write(resp.content)
Compliance Validation
Check if a dataset meets a specific compliance profile:
resp = requests.post(
f"{BASE_URL}/api/privacy/sessions/{session_id}/validate",
headers=HEADERS,
)
for check in resp.json()["results"]:
status = "PASS" if check["passed"] else "FAIL"
print(f" [{status}] {check['rule']}: {check['message']}")
Export Privacy Report
Generate a PDF report of the privacy scan:
resp = requests.get(
f"{BASE_URL}/api/privacy/sessions/{session_id}/report",
headers=HEADERS,
params={"ai_insights": True}, # optional AI suggestions
)
with open("privacy_report.pdf", "wb") as f:
f.write(resp.content)
SDK Example
The same workflow using the Python SDK:
from coreplexml import CorePlexMLClient
client = CorePlexMLClient(base_url=BASE_URL, api_key=API_KEY)
# Create policy with rules
policy = client.privacy.create_policy(
project_id=project_id,
name="GDPR Compliance",
profile="gdpr",
)
client.privacy.add_rule(policy["id"], pii_type="email", action="hash")
client.privacy.add_rule(policy["id"], pii_type="phone_number", action="mask")
# Scan
session = client.privacy.create_session(
policy_id=policy["id"],
dataset_id=dataset_id,
)
client.privacy.detect(session["id"])
client.privacy.transform(session["id"])
# Download result
client.privacy.download(session["id"], "anonymized.csv")
See also
Privacy Suite API – Full Privacy API reference
Synthetic Data Generation – Generate synthetic data from anonymized datasets