Skip to main content

CLI for the Expunct PII redaction API — redact, detect, and manage sensitive data from the command line.

Project description

Expunct CLI

Privacy infrastructure for modern applications.

Redact PII, secrets, and sensitive data from text, logs, and files — before it reaches AI, analytics, or external APIs.


🚀 Quick Start

pip install expunct-cli
export EXPUNCT_API_KEY=your_api_key
expunct redact --text "John Smith email john@gmail.com"

Output:

PERSON_1 email EMAIL_1

✨ Why Expunct?

Modern applications constantly handle sensitive data:

  • AI prompts sent to LLMs
  • application logs
  • customer support tickets
  • analytics pipelines

Expunct helps you:

  • 🔒 Detect PII (emails, phone numbers, names, etc.)
  • 🧠 Detect secrets (API keys, tokens, credentials)
  • 🧩 Apply policies (redact, mask, pseudonymize)
  • ⚡ Sanitize data before it leaves your system

🧪 Examples

Redact sensitive data

expunct redact --text "Contact me at john@gmail.com"

Detect entities without modifying text

expunct detect --text "My email is john@gmail.com"

Process a file

expunct redact logs.txt

Redact binary files (PDF, DOCX, images, video, audio)

expunct redact report.pdf --output redacted_report.pdf

Use with pipes (great for scripts & agents)

cat logs.txt | expunct redact
echo "My SSN is 123-45-6789" | expunct detect

Cloud URI redaction

expunct redact --uri "gs://my-bucket/file.txt"
expunct redact --uri "s3://my-bucket/file.txt" --no-wait

JSON output (for scripting & AI agents)

expunct redact --text "My SSN is 123-45-6789" --json
expunct detect --text "Jane Doe" --json | jq '.findings[] | .entity_type'

🧱 How It Works

Your App / Logs / Files
           ↓
        Expunct
           ↓
   Clean, safe data
           ↓
AI / APIs / Analytics

Expunct acts as a privacy layer that removes sensitive data before it leaves your system.


📖 Commands

expunct redact

Redact PII from text, files, or URIs.

expunct redact --text "Call me at 555-1234"
expunct redact notes.txt
expunct redact scan.pdf -o redacted_scan.pdf
expunct redact --uri "gs://my-bucket/file.txt"
cat file.txt | expunct redact
Flag Description
--text, -t Inline text to redact
--uri, -u Cloud URI to redact
--output, -o Output file path (required for binary formats)
--language, -l Language code (default: en)
--policy-id, -p Policy ID to apply
--json Raw JSON output
--wait/--no-wait Wait for URI jobs (default: --wait)
--timeout Wait timeout in seconds (default: 300)

expunct detect

Detect PII entities without redacting. Shows entity type, value, confidence, and location.

expunct detect --text "My name is Jane Doe and my SSN is 123-45-6789"
expunct detect document.txt
expunct detect --uri "gs://bucket/file.txt"
echo "test@email.com" | expunct detect

expunct jobs

Manage redaction jobs.

expunct jobs list
expunct jobs list --status completed --page 2
expunct jobs get JOB_ID
expunct jobs download JOB_ID -o output.pdf
expunct jobs wait JOB_ID --timeout 600

expunct policies

Manage redaction policies.

expunct policies list
expunct policies create --name "strict" --confidence-threshold 0.9
expunct policies get POLICY_ID
expunct policies update POLICY_ID --name "updated-name"
expunct policies delete POLICY_ID --yes

expunct audit

View audit logs.

expunct audit list
expunct audit list --event-type redaction --page-size 50
expunct audit list --json

Document Intelligence commands (beta)

parse, extract, and safe-parse call the Expunct Document Intelligence API, which is currently in beta and feature-flag gated. Your tenant must be enabled before these commands will succeed; calls from disabled tenants return a 403. PDF and DOCX are the supported formats during beta.

expunct parse

Parse a PDF or DOCX into a canonical document structure (text, tables, headings, layout).

expunct parse report.pdf
expunct parse contract.docx --language en
expunct parse report.pdf --no-wait          # submit and return job ID immediately
expunct parse report.pdf --json             # raw JSON output
Flag Description
--language, -l Language code (default: en)
--wait/--no-wait Wait for job to complete (default: --wait)
--timeout Wait timeout in seconds (default: 300)
--json Raw JSON output

expunct extract

Extract structured fields from a parsed document using a JSON Schema or built-in template.

# From an existing parse artifact (preferred — avoids re-parsing)
expunct extract art-abc123 --schema invoice_schema.json

# Parse and extract in one step
expunct extract --file report.pdf --template invoice

# Inline schema
expunct extract art-abc123 --schema-json '{"type":"object","properties":{"total":{"type":"number"}}}'
Flag Description
PARSE_ARTIFACT_ID Canonical document artifact ID from a completed parse job
--file, -f PDF or DOCX to parse and extract in one step (mutually exclusive with artifact ID)
--schema, -s Path to JSON Schema file defining fields to extract
--schema-json Inline JSON Schema string
--template, -t Built-in template ID (e.g. invoice)
--language, -l Language code (default: en)
--wait/--no-wait Wait for job to complete (default: --wait)
--timeout Wait timeout in seconds (default: 300)
--json Raw JSON output

expunct safe-parse

Parse a PDF or DOCX and sanitize PII in a single workflow. Produces sanitized canonical document, sanitized markdown, and sanitized chunk artifacts suitable for AI ingestion (RAG, prompts, embeddings).

expunct safe-parse contract.pdf
expunct safe-parse report.docx --policy-id strict
expunct safe-parse contract.pdf --no-wait        # submit and return job ID immediately
expunct safe-parse contract.pdf --json           # raw JSON output
Flag Description
--policy-id, -p Redaction policy ID applied during sanitization
--language, -l Language code (default: en)
--wait/--no-wait Wait for job to complete (default: --wait)
--timeout Wait timeout in seconds (default: 300)
--json Raw JSON output

expunct config

Manage CLI configuration.

expunct config set api_key YOUR_API_KEY
expunct config set base_url https://api.expunct.ai
expunct config get base_url
expunct config show
expunct config path

🔑 Authentication

Option A: Environment variable (recommended for CI/scripts)

export EXPUNCT_API_KEY=your_api_key

Option B: Config file

expunct config set api_key YOUR_API_KEY

Stored in ~/.expunct/config.json:

{
  "api_key": "your_api_key",
  "base_url": "https://api.expunct.ai",
  "tenant_id": "your-tenant-id"
}
Variable Description
EXPUNCT_API_KEY API key (overrides config file)
EXPUNCT_BASE_URL API base URL (overrides config file)
EXPUNCT_TENANT_ID Tenant ID (overrides config file)

⚙️ Output Modes

# Default: human-readable with rich formatting
expunct redact --text "My email is john@gmail.com"

# JSON: machine-readable for piping and scripting
expunct redact --text "My email is john@gmail.com" --json

🤖 Agent-Friendly Design

The CLI is designed to work well with AI agents and automation:

  • Deterministic output with --json flag
  • Stdin piping for chaining commands
  • Non-interactive — no prompts in --json mode
  • Exit codes — 0 for success, 1 for errors
# Pipe redacted text
expunct redact --text "My phone is 555-0100" --json | jq .redacted

# Chain with other tools
cat logs.txt | expunct redact | grep "ERROR"

# Use in scripts
expunct jobs list --json | jq '.jobs[].id'

🖥️ Platform Support

The CLI is pure Python and works on macOS, Linux, and Windows:

# All platforms via pip
pip install expunct-cli

# macOS via Homebrew (coming soon)
brew install expunct/tap/expunct-cli

🔒 Built for Developers

Expunct is built on top of proven detection tools like Microsoft Presidio, with added:

  • policy control
  • hosted API
  • scalable processing
  • multi-format support (text, PDF, DOCX, images, video, audio)

📚 Documentation

👉 https://docs.expunct.ai

🌐 Platform

👉 https://expunct.ai


💡 Roadmap

  • Pseudonymization (reversible identity masking)
  • Secret detection expansion
  • Batch processing via CLI
  • Directory processing
  • Streaming mode
  • Self-hosted / VPC deployment

🤝 Contributing

Contributions welcome. Feel free to open issues or PRs.


📄 License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

expunct_cli-0.2.0.tar.gz (13.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

expunct_cli-0.2.0-py3-none-any.whl (21.1 kB view details)

Uploaded Python 3

File details

Details for the file expunct_cli-0.2.0.tar.gz.

File metadata

  • Download URL: expunct_cli-0.2.0.tar.gz
  • Upload date:
  • Size: 13.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for expunct_cli-0.2.0.tar.gz
Algorithm Hash digest
SHA256 4ac9c2e35aa944c6a9d5c1b7cd40840b7224b730ff6953020a75b3c87063b115
MD5 2d06d71314c26718b34414490eb38be7
BLAKE2b-256 a59fb5880c408ac14a665459ccabfa566aadb6f1beb5af40568423ef04ac018f

See more details on using hashes here.

File details

Details for the file expunct_cli-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: expunct_cli-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 21.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for expunct_cli-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 dddd670b19caddf564da5da45414df37a5ead943f16a689d3a4ccd967d462d7e
MD5 655510679487d0885e48bd5c2e5e74d7
BLAKE2b-256 6a3c82d6d3103ad50dd36f80172549e312689f6c2b566dd77593dba7508cd7c6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page