Python SDK for Anonymization REST API
Project description
Anonymization SDK
Python client library for the Protegrity Anonymization server
Requirements
Running Anonymization Server: This SDK requires a running Anonymization server.
Installation
# Using pip
pip install -e .
# Using uv, by adding dependency
uv sync
Quick Start
from anonymization_sdk import AnonymizationClient, PrivacyModel
with AnonymizationClient(base_url="http://localhost:8000") as client:
result = client.auto_anonymize(
data=[
{"patient_id": "P001", "age": 25, "zipcode": "12345", "disease": "flu"},
{"patient_id": "P002", "age": 30, "zipcode": "23456", "disease": "cold"},
],
privacy_model=PrivacyModel.K_ANONYMITY,
k=2,
)
print(f"Anonymized {result.anonymization.row_count} records")
Features
- Simple API: Easy-to-use Python client for the Anonymization server
- Privacy Models: k-anonymity, l-diversity, t-closeness, differential privacy
- Auto-Detection: Automatic QI/DI/SA detection
- Solution Reuse:
apply_anon()re-applies a saved solution to new data batches without recomputing - Lattice Search: Opt-in optimal generalization level search (
use_lattice_search=True) - Risk Metrics: Calculate re-identification risk
Detection
detect_qi()— Auto-detect quasi-identifiersgenerate_config()— Auto-generate anonymization config
Anonymization
anonymize()— Synchronous anonymization (supportsuse_lattice_search,mlops_config)apply_anon()— Apply a saved solution to new data without recomputingsubmit_job()— Submit async anonymization jobget_job_status()— Check job statuscancel_job()— Cancel running jobauto_anonymize()— One-step detection + anonymization
Risk & Validation
calculate_risk()— Calculate re-identification risk (supportsmlops_config)validate()— Validate privacy guaranteesmeasure()— Measure anonymization quality
Differential Privacy
dp_compute()— Compute DP-protected aggregate (mean, sum, variance, histogram)dp_stream_update()— feed a batch into a streaming session (creates on first call)dp_stream_delete()— close and delete a streaming sessiondp_stream_list_sessions()— list active streaming sessionsdp_budget_create()/dp_budget_status()/dp_budget_delete()dp_advise_composition()— advise on epsilon/delta budget for a composition plan
Privacy Models
| Model | Protection Level | Use Case | Key Feature |
|---|---|---|---|
| k-anonymity | Basic | Hide in groups of k | Each record indistinguishable from k-1 others |
| l-diversity | Enhanced | Diverse sensitive values | Prevents homogeneity attacks |
| t-closeness | Advanced | Distribution matching | Prevents skewness attacks |
| Differential Privacy | Mathematical | Aggregate queries, streaming | Provable ε-privacy guarantees via calibrated noise |
Example Use Cases
- Healthcare: Anonymize patient records for research (HIPAA compliance)
- Finance: Share transaction data for analysis (PCI DSS compliance)
- Marketing: Publish customer analytics datasets (GDPR compliance)
- Research: Share study data with collaborators (IRB approval)
Lattice Search
By default the server applies level 1 of every configured hierarchy. Pass use_lattice_search=True to find the shallowest generalization level combination that satisfies k-anonymity — yielding lower information loss:
result = client.anonymize(
data=data,
privacy_model="k-anonymity",
k=10,
max_suppression=0.05,
attributes=[...],
use_lattice_search=True,
lattice_strategy="basic", # basic | with_importance | with_deviation | full
)
The generalization_levels field in the result shows the actual levels chosen per QI.
Solution Reuse (apply_anon)
Re-apply the exact same solution to new data batches without recomputing hierarchies:
# Step 1: anonymize training batch → solution stored server-side
anon = client.anonymize(data=training_data, privacy_model="k-anonymity", k=5, ...)
job_id = anon.job_id
# Step 2: apply to any new batch instantly
apply_result = client.apply_anon(job_id=job_id, data=new_batch)
print(f"Applied: {apply_result.row_count} rows | Suppressed: {apply_result.suppressed_count}")
client = AnonymizationClient(
base_url="http://localhost:8000",
mlops_config={
"postgres_dsn": "postgresql://mlopsuser:mlopspsw@localhost:5432/mlopsdb",
"experiment_prefix": "my-project",
"model_name": "patient-records",
"auto_promote": True,
"promotion_metric": "combined_loss",
"promotion_direction": "lower_better",
},
)
result = client.anonymize(data=data, privacy_model="k-anonymity", k=5, ...)
models = client.list_models()
Documentation
API reference is available via docstrings in src/anonymization_sdk/.
Refer to Online Documentation
Support
- Issues: Report bugs and request features via issue tracker
- Email: info@protegrity.com
License
MIT License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file protegrity_anonymization_sdk-2.0.0-py3-none-any.whl.
File metadata
- Download URL: protegrity_anonymization_sdk-2.0.0-py3-none-any.whl
- Upload date:
- Size: 28.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ff7972fe580a22076e638d833ff9f3d845f3b27a47c7999b22993e7cc2472a21
|
|
| MD5 |
99a83838fa3bc34f24d4864d7ad81c4e
|
|
| BLAKE2b-256 |
6cad90fc8093e9c39234e79d14820c264cd2aaa028dba4058d29d2d11a983a6f
|