Skip to main content

Python SDK for Anonymization REST API

Project description

Anonymization SDK

Python client library for the Protegrity Anonymization server

Requirements

Running Anonymization Server: This SDK requires a running Anonymization server.

Installation

# Using pip
pip install -e .

# Using uv, by adding dependency
uv sync

Quick Start

from anonymization_sdk import AnonymizationClient, PrivacyModel

with AnonymizationClient(base_url="http://localhost:8000") as client:
    result = client.auto_anonymize(
        data=[
            {"patient_id": "P001", "age": 25, "zipcode": "12345", "disease": "flu"},
            {"patient_id": "P002", "age": 30, "zipcode": "23456", "disease": "cold"},
        ],
        privacy_model=PrivacyModel.K_ANONYMITY,
        k=2,
    )
    print(f"Anonymized {result.anonymization.row_count} records")

Features

  • Simple API: Easy-to-use Python client for the Anonymization server
  • Privacy Models: k-anonymity, l-diversity, t-closeness, differential privacy
  • Auto-Detection: Automatic QI/DI/SA detection
  • Solution Reuse: apply_anon() re-applies a saved solution to new data batches without recomputing
  • Lattice Search: Opt-in optimal generalization level search (use_lattice_search=True)
  • Risk Metrics: Calculate re-identification risk

Detection

  • detect_qi() — Auto-detect quasi-identifiers
  • generate_config() — Auto-generate anonymization config

Anonymization

  • anonymize() — Synchronous anonymization (supports use_lattice_search, mlops_config)
  • apply_anon() — Apply a saved solution to new data without recomputing
  • submit_job() — Submit async anonymization job
  • get_job_status() — Check job status
  • cancel_job() — Cancel running job
  • auto_anonymize() — One-step detection + anonymization

Risk & Validation

  • calculate_risk() — Calculate re-identification risk (supports mlops_config)
  • validate() — Validate privacy guarantees
  • measure() — Measure anonymization quality

Differential Privacy

  • dp_compute() — Compute DP-protected aggregate (mean, sum, variance, histogram)
  • dp_stream_update() — feed a batch into a streaming session (creates on first call)
  • dp_stream_delete() — close and delete a streaming session
  • dp_stream_list_sessions() — list active streaming sessions
  • dp_budget_create() / dp_budget_status() / dp_budget_delete()
  • dp_advise_composition() — advise on epsilon/delta budget for a composition plan

Privacy Models

Model Protection Level Use Case Key Feature
k-anonymity Basic Hide in groups of k Each record indistinguishable from k-1 others
l-diversity Enhanced Diverse sensitive values Prevents homogeneity attacks
t-closeness Advanced Distribution matching Prevents skewness attacks
Differential Privacy Mathematical Aggregate queries, streaming Provable ε-privacy guarantees via calibrated noise

Example Use Cases

  • Healthcare: Anonymize patient records for research (HIPAA compliance)
  • Finance: Share transaction data for analysis (PCI DSS compliance)
  • Marketing: Publish customer analytics datasets (GDPR compliance)
  • Research: Share study data with collaborators (IRB approval)

Lattice Search

By default the server applies level 1 of every configured hierarchy. Pass use_lattice_search=True to find the shallowest generalization level combination that satisfies k-anonymity — yielding lower information loss:

result = client.anonymize(
    data=data,
    privacy_model="k-anonymity",
    k=10,
    max_suppression=0.05,
    attributes=[...],
    use_lattice_search=True,
    lattice_strategy="basic",  # basic | with_importance | with_deviation | full
)

The generalization_levels field in the result shows the actual levels chosen per QI.

Solution Reuse (apply_anon)

Re-apply the exact same solution to new data batches without recomputing hierarchies:

# Step 1: anonymize training batch → solution stored server-side
anon = client.anonymize(data=training_data, privacy_model="k-anonymity", k=5, ...)
job_id = anon.job_id

# Step 2: apply to any new batch instantly
apply_result = client.apply_anon(job_id=job_id, data=new_batch)
print(f"Applied: {apply_result.row_count} rows | Suppressed: {apply_result.suppressed_count}")
client = AnonymizationClient(
    base_url="http://localhost:8000",
    mlops_config={
        "postgres_dsn": "postgresql://mlopsuser:mlopspsw@localhost:5432/mlopsdb",
        "experiment_prefix": "my-project",
        "model_name": "patient-records",
        "auto_promote": True,
        "promotion_metric": "combined_loss",
        "promotion_direction": "lower_better",
    },
)
result = client.anonymize(data=data, privacy_model="k-anonymity", k=5, ...)
models = client.list_models()

Documentation

API reference is available via docstrings in src/anonymization_sdk/. Refer to Online Documentation

Support

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

protegrity_anonymization_sdk-2.0.0-py3-none-any.whl (28.3 kB view details)

Uploaded Python 3

File details

Details for the file protegrity_anonymization_sdk-2.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for protegrity_anonymization_sdk-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ff7972fe580a22076e638d833ff9f3d845f3b27a47c7999b22993e7cc2472a21
MD5 99a83838fa3bc34f24d4864d7ad81c4e
BLAKE2b-256 6cad90fc8093e9c39234e79d14820c264cd2aaa028dba4058d29d2d11a983a6f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page