Python SDK for Expunct privacy APIs — PII redaction plus beta document-intelligence workflows for enabled tenants.

These details have not been verified by PyPI

Project links

Project description

Expunct Python SDK

Privacy infrastructure for modern applications. Detect and redact PII, secrets, and sensitive data before it reaches AI, logs, or external APIs.

Installation

pip install expunct

Get your API key at expunct.ai — free tier includes 1M tokens/month, no credit card required.

Quick Start

from expunct import Expunct

client = Expunct(api_key="your-api-key")
redacted = client.sanitize_text("Alice Johnson's email is alice@example.com and SSN is 219-09-9999.")
print(redacted)
# Output: PERSON_1's email is EMAIL_ADDRESS_1 and SSN is US_SSN_1.

Usage

Text redaction (sync)

from expunct import Expunct

client = Expunct(api_key="your-api-key")

redacted = client.sanitize_text("Call Bob at 415-555-0100 or bob@example.com")
print(redacted)
# Call PERSON_1 at PHONE_NUMBER_1 or EMAIL_ADDRESS_1

Text redaction (async)

import asyncio
from expunct import AsyncExpunct

async def main():
    async with AsyncExpunct(api_key="your-api-key") as client:
        redacted = await client.sanitize_text("Call Bob at 415-555-0100 or bob@example.com")
        print(redacted)

asyncio.run(main())

File redaction (PDF, DOCX, images, audio)

from expunct import Expunct

client = Expunct(api_key="your-api-key")

# Pass a file path — returns redacted bytes
redacted_bytes = client.sanitize_file("contract.pdf")

# Save directly to disk
client.sanitize_file("contract.pdf", dest="contract_redacted.pdf")

# Pass a file-like object
with open("invoice.docx", "rb") as f:
    redacted_bytes = client.sanitize_file(f)

URI redaction (cloud storage)

Submit a file hosted in cloud storage (S3, GCS, Azure Blob) for redaction. The optional output_uri controls where the redacted file is written; if omitted the result is available via jobs.download().

from expunct import Expunct

client = Expunct(api_key="your-api-key")

job = client.sanitize_uri(
    "s3://my-bucket/reports/q1.pdf",
    output_uri="s3://my-bucket/reports/q1_redacted.pdf",
)
print(job.status)           # "completed"
print(job.findings_count)   # number of PII items found

Batch URI redaction

Enqueue multiple files in one call via the lower-level redact.batch() method, then poll the batch status:

from expunct import Expunct

client = Expunct(api_key="your-api-key")

batch = client.redact.batch(
    input_uris=[
        "s3://my-bucket/docs/file1.pdf",
        "s3://my-bucket/docs/file2.pdf",
    ],
    language="en",
)
print(batch.id, batch.total_jobs)

# Poll progress
status = client.batch.get(batch.id)
print(status.completed_jobs, status.failed_jobs)

Environment variable

Set EXPUNCT_API_KEY to avoid hard-coding the key in code:

import os
from expunct import Expunct

client = Expunct(api_key=os.environ["EXPUNCT_API_KEY"])

Custom policy

Policies let you control which entity types are detected, the redaction method, confidence thresholds, and more. Create a policy once and reference it by ID on every job.

from expunct import Expunct, PolicyCreate

client = Expunct(api_key="your-api-key")

# Create a policy that only redacts PII and uses pseudonymization
policy = client.policies.create(PolicyCreate(
    name="pii-only-pseudonymize",
    pii_categories=["PII"],
    redaction_method="pseudonymization",
    confidence_threshold=0.7,
))

# Use the policy when uploading a file
job = client.redact.file("report.pdf", policy_id=policy.id)
completed = client.wait_for_job(job.id)
redacted_bytes = client.jobs.download(completed.id)

Inspecting findings

Every completed job exposes the PII entities that were found:

from expunct import Expunct

client = Expunct(api_key="your-api-key")

redacted_bytes = client.sanitize_file("form.pdf")

# Re-fetch job detail to inspect findings
jobs = client.jobs.list(page=1, page_size=1)
detail = client.jobs.get(jobs.jobs[0].id)

for finding in detail.findings:
    print(finding.entity_type, finding.confidence, finding.entity_value)

Error handling

from expunct import Expunct, AuthenticationError, RateLimitError, PollingTimeoutError

client = Expunct(api_key="your-api-key")

try:
    redacted = client.sanitize_text("Alice, SSN 219-09-9999")
except AuthenticationError:
    print("Invalid API key")
except RateLimitError as e:
    print(f"Rate limited — retry after {e.retry_after}s")
except PollingTimeoutError as e:
    print(f"Job {e.job_id} timed out after {e.timeout}s")

Context manager (sync)

from expunct import Expunct

with Expunct(api_key="your-api-key") as client:
    redacted = client.sanitize_text("John Smith, DOB 01/01/1980")

Client reference

`Expunct` / `AsyncExpunct`

Parameter	Type	Default	Description
`api_key`	`str`	required	Your Expunct API key
`base_url`	`str`	`https://api.expunct.ai`	Override for self-hosted or staging
`tenant_id`	`str \| None`	`None`	Multi-tenant isolation header
`timeout`	`float`	`30.0`	Per-request timeout in seconds
`max_retries`	`int`	`3`	Automatic retries on transient errors

Convenience methods

Method	Returns	Description
`sanitize_text(text, *, language)`	`str`	Redact text in one call (upload → poll → decode)
`sanitize_file(file, *, language, dest)`	`bytes`	Upload a file, poll, return redacted bytes
`sanitize_uri(input_uri, *, language, output_uri)`	`JobDetailResponse`	Submit a URI, poll, return completed job
`wait_for_job(job_id, *, interval, timeout)`	`JobDetailResponse`	Poll a job until it completes or times out

Resource methods

`client.redact`

Method	Returns	Description
`redact.file(file, *, config, language, policy_id)`	`JobResponse`	Upload a file and enqueue a redaction job
`redact.uri(input_uri, *, output_uri, config, language, metadata)`	`JobResponse`	Submit a cloud URI for redaction
`redact.batch(input_uris, *, config, language, metadata)`	`BatchJobResponse`	Submit multiple URIs as a batch

`client.jobs`

Method	Returns	Description
`jobs.list(*, page, page_size, status)`	`JobListResponse`	List jobs with optional status filter
`jobs.get(job_id)`	`JobDetailResponse`	Get job detail including findings
`jobs.report(job_id)`	`dict`	Get full structured report for a job
`jobs.download(job_id, *, dest)`	`bytes`	Download redacted output; optionally save to `dest`

`client.policies`

Method	Returns	Description
`policies.list()`	`list[PolicyResponse]`	List all policies
`policies.create(policy)`	`PolicyResponse`	Create a new policy
`policies.get(policy_id)`	`PolicyResponse`	Fetch a policy by ID
`policies.update(policy_id, policy)`	`PolicyResponse`	Update a policy
`policies.delete(policy_id)`	`None`	Delete a policy

`client.batch`

Method	Returns	Description
`batch.get(batch_id)`	`BatchJobResponse`	Get status of a batch job

`client.api_keys`

Method	Returns	Description
`api_keys.list()`	`list[ApiKeyResponse]`	List API keys for your account
`api_keys.create(key)`	`ApiKeyCreateResponse`	Create a new API key
`api_keys.revoke(key_id)`	`dict`	Revoke an API key

`client.audit`

Method	Returns	Description
`audit.list(*, page, page_size, event_type)`	`AuditListResponse`	List audit log entries

Detected Entity Types

Expunct detects the following entity types by default (all categories enabled):

PII (Personally Identifiable Information)

Type	Example
`PERSON`	John Smith
`EMAIL_ADDRESS`	john@example.com
`PHONE_NUMBER`	415-555-0100
`LOCATION`	San Francisco, CA
`DATE_TIME`	January 1, 1990
`NRP`	American, French (nationalities, religions, political groups)
`ORGANIZATION`	Acme Corp
`URL`	https://example.com
`IP_ADDRESS`	192.168.1.1
`US_DRIVER_LICENSE`	D1234567
`US_PASSPORT`	123456789
`US_ITIN`	900-70-0000

PCI (Payment Card Industry)

Type	Example
`CREDIT_CARD`	4111 1111 1111 1111
`US_BANK_NUMBER`	123456789
`IBAN_CODE`	GB29NWBK60161331926819
`CRYPTO`	1BoatSLRHtKNngkdXEeobR76b53LETtpyT
`CVV`	123
`EXPIRY_DATE`	12/26
`CARD_HOLDER_NAME`	J. Smith
`PIN_NUMBER`	1234
`ACCOUNT_NUMBER`	000123456789

PHI (Protected Health Information)

Type	Example
`US_SSN`	219-09-9999
`MEDICAL_LICENSE`	A1234567

You can restrict detection to specific types using a RedactConfig or by setting pii_types on a policy:

from expunct import Expunct, RedactConfig

client = Expunct(api_key="your-api-key")

config = RedactConfig(
    pii_types=["PERSON", "EMAIL_ADDRESS", "US_SSN"],
    redaction_method="blur",
    confidence_threshold=0.6,
)
job = client.redact.file("document.pdf", config=config.model_dump())

Exceptions

Exception	Raised when
`AuthenticationError`	API key is invalid or expired (401/403)
`NotFoundError`	Job or resource not found (404)
`ValidationError`	Request payload is invalid (422)
`RateLimitError`	Rate limit exceeded after retries (429)
`PollingTimeoutError`	`wait_for_job` exceeded the timeout
`ApiError`	Base class for all SDK errors

Document Intelligence

Parse and extract structured data from PDFs and DOCX files.

Document Intelligence is currently in beta. parse, extract, and the safe_parse workflow are only available for enabled tenants on supported paid plans, and requests return 403 until the backend feature flags are turned on.

Parse a document

from expunct import Expunct

client = Expunct(api_key="your-api-key")

# Submit for parsing
job = client.documents.parse("contract.pdf", language="en")

# Poll until complete
completed = client.wait_for_document_job(job.id)

# Inspect produced artifacts (canonical_document, markdown_render, chunks_v1)
for artifact in completed.artifacts:
    print(artifact.artifact_kind, artifact.id)

# Fetch artifact metadata, then retrieve its JSON payload
canonical = next(
    artifact for artifact in completed.artifacts if artifact.artifact_kind == "canonical_document"
)
metadata = client.documents.get_artifact(canonical.id)
content = client.documents.get_artifact_content(metadata.id)

Extract structured fields

Provide a JSON Schema to extract specific fields from a document. You can pass an existing parse artifact ID to avoid re-parsing:

from expunct import Expunct

client = Expunct(api_key="your-api-key")

schema = {
    "type": "object",
    "properties": {
        "invoice_number": {"type": "string"},
        "total_amount": {"type": "number"},
        "vendor_name": {"type": "string"},
    },
}

# Extract from a file directly
job = client.documents.extract(file="invoice.pdf", schema=schema)
completed = client.wait_for_document_job(job.id)

result = next(
    artifact for artifact in completed.artifacts if artifact.artifact_kind == "extraction_result"
)
data = client.documents.get_artifact_content(result.id)
print(data)
# {"invoice_number": "INV-1042", "total_amount": 3150.00, "vendor_name": "Acme Corp"}

Safe-parse (parse + PII redaction in one step)

from expunct import Expunct

client = Expunct(api_key="your-api-key")

# Parse and sanitize in a single workflow
job = client.documents.safe_parse("patient_notes.pdf", language="en")
completed = client.wait_for_document_job(job.id)

# Artifacts include the PII-sanitized canonical document, markdown, and chunks
for artifact in completed.artifacts:
    print(artifact.artifact_kind, artifact.id)

Async document intelligence

import asyncio
from expunct import AsyncExpunct

async def main():
    async with AsyncExpunct(api_key="your-api-key") as client:
        job = await client.documents.parse("report.pdf")
        completed = await client.wait_for_document_job(job.id)
        canonical = next(
            artifact
            for artifact in completed.artifacts
            if artifact.artifact_kind == "canonical_document"
        )
        content = await client.documents.get_artifact_content(canonical.id)
        print(content)

asyncio.run(main())

`client.documents` reference

Method	Returns	Description
`documents.parse(file, *, config, language)`	`DocumentJobResponse`	Submit a PDF/DOCX for parsing
`documents.extract(*, file, parse_artifact_id, schema, template_id, config, language)`	`DocumentJobResponse`	Extract fields from a file or parse artifact
`documents.safe_parse(file, *, config, policy_id, language)`	`DocumentJobResponse`	Parse + redact PII in one step
`documents.get_job(job_id)`	`DocumentJobDetailResponse`	Poll a document job
`documents.get_artifact(artifact_id)`	`ArtifactResponse`	Retrieve artifact metadata
`documents.get_artifact_content(artifact_id)`	`dict`	Retrieve artifact content as JSON
`wait_for_document_job(job_id, *, interval, timeout)`	`DocumentJobDetailResponse`	Poll until complete or timeout

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.0

May 25, 2026

0.1.1

Mar 17, 2026

0.1.0

Mar 16, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

expunct-0.2.0.tar.gz (22.1 kB view details)

Uploaded May 25, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

expunct-0.2.0-py3-none-any.whl (22.9 kB view details)

Uploaded May 25, 2026 Python 3

File details

Details for the file expunct-0.2.0.tar.gz.

File metadata

Download URL: expunct-0.2.0.tar.gz
Upload date: May 25, 2026
Size: 22.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for expunct-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`56aec9a79cbdd7015002cf54e57d62a89ee6fc073c37642e200b4bfc4c02fbd5`
MD5	`4c404c3534d6c788da3fea51179fd232`
BLAKE2b-256	`c56072562551a2f2def2bed954b511d5e1fd51b611c6c9f34a8b3dff3872359e`

See more details on using hashes here.

File details

Details for the file expunct-0.2.0-py3-none-any.whl.

File metadata

Download URL: expunct-0.2.0-py3-none-any.whl
Upload date: May 25, 2026
Size: 22.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for expunct-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`60e99fc76fbfb61613d3257295802461ee02b89c6f448b09a4b2052a6010cc75`
MD5	`26fbcf0bd2eabc8c697e27c631c5c825`
BLAKE2b-256	`df6b25fcac6ec783761183d77d7042e6ad2962cdbba52b4c9463069b199b237d`

See more details on using hashes here.

expunct 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Expunct Python SDK

Installation

Quick Start

Usage

Text redaction (sync)

Text redaction (async)

File redaction (PDF, DOCX, images, audio)

URI redaction (cloud storage)

Batch URI redaction

Environment variable

Custom policy

Inspecting findings

Error handling

Context manager (sync)

Client reference

Expunct / AsyncExpunct

Convenience methods

Resource methods

client.redact

client.jobs

client.policies

client.batch

client.api_keys

client.audit

Detected Entity Types

Exceptions

Document Intelligence

Parse a document

Extract structured fields

Safe-parse (parse + PII redaction in one step)

Async document intelligence

client.documents reference

Links

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`Expunct` / `AsyncExpunct`

`client.redact`

`client.jobs`

`client.policies`

`client.batch`

`client.api_keys`

`client.audit`

`client.documents` reference