Python SDK for the Expunct API
Project description
Expunct Python SDK
Privacy infrastructure for modern applications. Detect and redact PII, secrets, and sensitive data before it reaches AI, logs, or external APIs.
Installation
pip install expunct
Get your API key at expunct.ai — free tier includes 1M tokens/month, no credit card required.
Quick Start
from expunct import Expunct
client = Expunct(api_key="your-api-key")
redacted = client.sanitize_text("Alice Johnson's email is alice@example.com and SSN is 219-09-9999.")
print(redacted)
# Output: PERSON_1's email is EMAIL_ADDRESS_1 and SSN is US_SSN_1.
Usage
Text redaction (sync)
from expunct import Expunct
client = Expunct(api_key="your-api-key")
redacted = client.sanitize_text("Call Bob at 415-555-0100 or bob@example.com")
print(redacted)
# Call PERSON_1 at PHONE_NUMBER_1 or EMAIL_ADDRESS_1
Text redaction (async)
import asyncio
from expunct import AsyncExpunct
async def main():
async with AsyncExpunct(api_key="your-api-key") as client:
redacted = await client.sanitize_text("Call Bob at 415-555-0100 or bob@example.com")
print(redacted)
asyncio.run(main())
File redaction (PDF, DOCX, images, audio)
from expunct import Expunct
client = Expunct(api_key="your-api-key")
# Pass a file path — returns redacted bytes
redacted_bytes = client.sanitize_file("contract.pdf")
# Save directly to disk
client.sanitize_file("contract.pdf", dest="contract_redacted.pdf")
# Pass a file-like object
with open("invoice.docx", "rb") as f:
redacted_bytes = client.sanitize_file(f)
URI redaction (cloud storage)
Submit a file hosted in cloud storage (S3, GCS, Azure Blob) for redaction. The optional output_uri controls where the redacted file is written; if omitted the result is available via jobs.download().
from expunct import Expunct
client = Expunct(api_key="your-api-key")
job = client.sanitize_uri(
"s3://my-bucket/reports/q1.pdf",
output_uri="s3://my-bucket/reports/q1_redacted.pdf",
)
print(job.status) # "completed"
print(job.findings_count) # number of PII items found
Batch URI redaction
Enqueue multiple files in one call via the lower-level redact.batch() method, then poll the batch status:
from expunct import Expunct
client = Expunct(api_key="your-api-key")
batch = client.redact.batch(
input_uris=[
"s3://my-bucket/docs/file1.pdf",
"s3://my-bucket/docs/file2.pdf",
],
language="en",
)
print(batch.id, batch.total_jobs)
# Poll progress
status = client.batch.get(batch.id)
print(status.completed_jobs, status.failed_jobs)
Environment variable
Set EXPUNCT_API_KEY to avoid passing the key in code. The client reads it automatically when no api_key argument is provided — or you can read it yourself:
import os
from expunct import Expunct
client = Expunct(api_key=os.environ["EXPUNCT_API_KEY"])
Custom policy
Policies let you control which entity types are detected, the redaction method, confidence thresholds, and more. Create a policy once and reference it by ID on every job.
from expunct import Expunct, PolicyCreate
client = Expunct(api_key="your-api-key")
# Create a policy that only redacts PII and uses pseudonymization
policy = client.policies.create(PolicyCreate(
name="pii-only-pseudonymize",
pii_categories=["PII"],
redaction_method="pseudonymization",
confidence_threshold=0.7,
))
# Use the policy when uploading a file
job = client.redact.file("report.pdf", policy_id=policy.id)
completed = client.wait_for_job(job.id)
redacted_bytes = client.jobs.download(completed.id)
Inspecting findings
Every completed job exposes the PII entities that were found:
from expunct import Expunct
client = Expunct(api_key="your-api-key")
redacted_bytes = client.sanitize_file("form.pdf")
# Re-fetch job detail to inspect findings
jobs = client.jobs.list(page=1, page_size=1)
detail = client.jobs.get(jobs.jobs[0].id)
for finding in detail.findings:
print(finding.entity_type, finding.confidence, finding.entity_value)
Error handling
from expunct import Expunct, AuthenticationError, RateLimitError, PollingTimeoutError
client = Expunct(api_key="your-api-key")
try:
redacted = client.sanitize_text("Alice, SSN 219-09-9999")
except AuthenticationError:
print("Invalid API key")
except RateLimitError as e:
print(f"Rate limited — retry after {e.retry_after}s")
except PollingTimeoutError as e:
print(f"Job {e.job_id} timed out after {e.timeout}s")
Context manager (sync)
from expunct import Expunct
with Expunct(api_key="your-api-key") as client:
redacted = client.sanitize_text("John Smith, DOB 01/01/1980")
Client reference
Expunct / AsyncExpunct
| Parameter | Type | Default | Description |
|---|---|---|---|
api_key |
str |
required | Your Expunct API key |
base_url |
str |
https://api.expunct.ai |
Override for self-hosted or staging |
tenant_id |
str | None |
None |
Multi-tenant isolation header |
timeout |
float |
30.0 |
Per-request timeout in seconds |
max_retries |
int |
3 |
Automatic retries on transient errors |
Convenience methods
| Method | Returns | Description |
|---|---|---|
sanitize_text(text, *, language) |
str |
Redact text in one call (upload → poll → decode) |
sanitize_file(file, *, language, dest) |
bytes |
Upload a file, poll, return redacted bytes |
sanitize_uri(input_uri, *, language, output_uri) |
JobDetailResponse |
Submit a URI, poll, return completed job |
wait_for_job(job_id, *, interval, timeout) |
JobDetailResponse |
Poll a job until it completes or times out |
Resource methods
client.redact
| Method | Returns | Description |
|---|---|---|
redact.file(file, *, config, language, policy_id) |
JobResponse |
Upload a file and enqueue a redaction job |
redact.uri(input_uri, *, output_uri, config, language, metadata) |
JobResponse |
Submit a cloud URI for redaction |
redact.batch(input_uris, *, config, language, metadata) |
BatchJobResponse |
Submit multiple URIs as a batch |
client.jobs
| Method | Returns | Description |
|---|---|---|
jobs.list(*, page, page_size, status) |
JobListResponse |
List jobs with optional status filter |
jobs.get(job_id) |
JobDetailResponse |
Get job detail including findings |
jobs.report(job_id) |
dict |
Get full structured report for a job |
jobs.download(job_id, *, dest) |
bytes |
Download redacted output; optionally save to dest |
client.policies
| Method | Returns | Description |
|---|---|---|
policies.list() |
list[PolicyResponse] |
List all policies |
policies.create(policy) |
PolicyResponse |
Create a new policy |
policies.get(policy_id) |
PolicyResponse |
Fetch a policy by ID |
policies.update(policy_id, policy) |
PolicyResponse |
Update a policy |
policies.delete(policy_id) |
None |
Delete a policy |
client.batch
| Method | Returns | Description |
|---|---|---|
batch.get(batch_id) |
BatchJobResponse |
Get status of a batch job |
client.api_keys
| Method | Returns | Description |
|---|---|---|
api_keys.list() |
list[ApiKeyResponse] |
List API keys for your account |
api_keys.create(key) |
ApiKeyCreateResponse |
Create a new API key |
api_keys.revoke(key_id) |
dict |
Revoke an API key |
client.audit
| Method | Returns | Description |
|---|---|---|
audit.list(*, page, page_size, event_type) |
AuditListResponse |
List audit log entries |
Detected Entity Types
Expunct detects the following entity types by default (all categories enabled):
PII (Personally Identifiable Information)
| Type | Example |
|---|---|
PERSON |
John Smith |
EMAIL_ADDRESS |
john@example.com |
PHONE_NUMBER |
415-555-0100 |
LOCATION |
San Francisco, CA |
DATE_TIME |
January 1, 1990 |
NRP |
American, French (nationalities, religions, political groups) |
ORGANIZATION |
Acme Corp |
URL |
https://example.com |
IP_ADDRESS |
192.168.1.1 |
US_DRIVER_LICENSE |
D1234567 |
US_PASSPORT |
123456789 |
US_ITIN |
900-70-0000 |
PCI (Payment Card Industry)
| Type | Example |
|---|---|
CREDIT_CARD |
4111 1111 1111 1111 |
US_BANK_NUMBER |
123456789 |
IBAN_CODE |
GB29NWBK60161331926819 |
CRYPTO |
1BoatSLRHtKNngkdXEeobR76b53LETtpyT |
CVV |
123 |
EXPIRY_DATE |
12/26 |
CARD_HOLDER_NAME |
J. Smith |
PIN_NUMBER |
1234 |
ACCOUNT_NUMBER |
000123456789 |
PHI (Protected Health Information)
| Type | Example |
|---|---|
US_SSN |
219-09-9999 |
MEDICAL_LICENSE |
A1234567 |
You can restrict detection to specific types using a RedactConfig or by setting pii_types on a policy:
from expunct import Expunct, RedactConfig
client = Expunct(api_key="your-api-key")
config = RedactConfig(
pii_types=["PERSON", "EMAIL_ADDRESS", "US_SSN"],
redaction_method="blur",
confidence_threshold=0.6,
)
job = client.redact.file("document.pdf", config=config.model_dump())
Exceptions
| Exception | Raised when |
|---|---|
AuthenticationError |
API key is invalid or expired (401/403) |
NotFoundError |
Job or resource not found (404) |
ValidationError |
Request payload is invalid (422) |
RateLimitError |
Rate limit exceeded after retries (429) |
PollingTimeoutError |
wait_for_job exceeded the timeout |
ApiError |
Base class for all SDK errors |
Links
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file expunct-0.1.1.tar.gz.
File metadata
- Download URL: expunct-0.1.1.tar.gz
- Upload date:
- Size: 18.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9cd954d0f595059598981cdcfc1967e86655dd72b529567faa965dbd1180496c
|
|
| MD5 |
c73c6c9d21688e9c6e20d7644a93f7c7
|
|
| BLAKE2b-256 |
85ca69eacefbd3e2d04c3f4b444a639c9a54c34702df5ddf5a8d46f241d8287b
|
File details
Details for the file expunct-0.1.1-py3-none-any.whl.
File metadata
- Download URL: expunct-0.1.1-py3-none-any.whl
- Upload date:
- Size: 18.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8640ca7f6197483fe5421a1ae9e61e3ead53a4a5b2d0a7de754da5788c7c03b2
|
|
| MD5 |
aa04144a8227fbfda7546f50d5f479d9
|
|
| BLAKE2b-256 |
2aaf41e6f8fb0d8813e688ddb05104da40bcd26ebbc383f7da13e3603f316f79
|