Skip to main content

Make data safe before feeding it to AI

Project description

RedactAI

Strip PII from text, files, and pipelines before it reaches your AI.

Three ways to use it

  • Hosted API — point your client at https://api.redactai.dev. No install.
  • Python library + CLIpip install redactai. Run locally, in CI, or in your own service.
  • MCP server — expose RedactAI as tools for Claude Desktop or other AI agents.

Install

pip install redactai
python -m spacy download en_core_web_sm

Quick Start

Clean a file from the CLI:

redactai clean data.csv -o data.clean.csv

Clean text in Python:

from redactai import clean

safe = clean("Call John Smith at 555-0123")
# "Call Marcia Wells at 555-8912"  (faker replacements by default)

Scan for PII in CI:

redactai scan ./data --ci  # exits 1 if PII detected

Call the hosted API directly:

curl -X POST https://api.redactai.dev/api/v1/anonymize \
  -H "X-API-Key: $REDACTAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"text": "Email john@acme.com about the deal", "profile": "llm_guardrail"}'

Ask for an API key from the maintainer (private beta).

CLI Commands

Command Description
redactai clean [PATH] Anonymize a file, folder, or stdin
redactai scan PATH Detect PII and report findings
redactai analyze Analyze text or file and return entity details
redactai watch PATH Watch a folder and clean files on change
redactai init Generate a .redactai.yml config file
redactai entities List supported PII entity types
redactai profiles List built-in profiles
redactai profiles show NAME Show profile details
redactai mcp Start MCP tool server for AI agents
redactai server start Start the local API daemon
redactai server stop Stop the daemon
redactai server status Show daemon status
redactai server restart Restart the daemon
redactai login Authenticate with a remote API
redactai logout Remove stored credentials
redactai whoami Show current auth status

Python API

from redactai import clean, scan

clean(text, *, profile, threshold, language, entities, operators) -> str

# Use a built-in profile
clean("Email me at john@acme.com", profile="llm_guardrail")
# "Email me at <EMAIL_ADDRESS>"

# Override operator for a specific entity
clean("Call 555-0123", operators={"PHONE_NUMBER": {"type": "mask", "masking_char": "*", "chars_to_mask": 6}})
# "Call ***-****"

scan(text, *, threshold, language, entities) -> list[dict]

hits = scan("My SSN is 123-45-6789")
# [{"entity_type": "US_SSN", "start": 10, "end": 21, "score": 0.85, "text": "123-45-6789"}]

Other exports

from redactai import entities, profiles, profile_detail

entities()          # ["CREDIT_CARD", "EMAIL_ADDRESS", "PERSON", ...]
profiles()          # [{"id": "llm_guardrail@1", "name": "llm_guardrail", ...}, ...]
profile_detail("llm_guardrail")  # full config including operators

Profiles

Profile Description Threshold
llm_guardrail Redact all PII before sending to LLMs 0.3
app_logs_safe Mask PII in logs, keep structure for debugging 0.7
analytics_pseudonymized Replace PII with consistent fakes for analytics 0.5
customer_support_shareable Redact sensitive PII, keep names/locations for context 0.5
strict_compliance_export Maximum redaction for GDPR/HIPAA/CCPA compliance 0.3
dev_demo_readable Replace PII with realistic Faker data for demos 0.5

CI/CD

Exit code

redactai scan ./data --ci  # exit 0 = clean, exit 1 = PII found

GitHub Actions

- uses: actions/setup-python@v6
  with:
    python-version: "3.12"
- run: pip install redactai && python -m spacy download en_core_web_sm
- run: redactai scan ./data --ci

Hosted API in CI

Skip the install entirely and call the hosted API:

- name: Scan for PII
  run: |
    curl -f -X POST https://api.redactai.dev/api/v1/analyze \
      -H "X-API-Key: ${{ secrets.REDACTAI_API_KEY }}" \
      -H "Content-Type: application/json" \
      --data-binary @data.json

Config (.redactai.yml)

Generate a starter config:

redactai init

Minimal example:

profile: llm_guardrail
threshold: 0.4
entities:
  - PERSON
  - EMAIL_ADDRESS
  - CREDIT_CARD

operators:
  PERSON:
    type: faker
    locale: en_US
  EMAIL_ADDRESS:
    type: redact

allow_list:
  - "Acme Corp"

files:
  include:
    - "**/*.csv"
    - "**/*.txt"
  exclude:
    - "**/node_modules/**"
  output_dir: ./clean

Hooks

Three hook layers fire on events: pre_scan, on_pii_detected, post_clean, on_error.

Shell hooks (in .redactai.yml)

hooks:
  post_clean:
    - shell: "echo 'Cleaned {{file}} -> {{output_file}}'"
  on_pii_detected:
    - shell: "notify-send 'PII found: {{entity_count}} entities in {{file}}'"

Python plugins

from redactai.hooks import on_pii_detected, HookEvent

@on_pii_detected
def alert(event: HookEvent):
    print(f"Found {event.entity_count} entities in {event.file}")

Webhooks

hooks:
  on_pii_detected:
    - url: "https://hooks.slack.com/services/..."

MCP Server

Expose RedactAI as tools for Claude Desktop or other AI agents:

pip install redactai[mcp]
redactai mcp  # starts stdio transport

Add to Claude Desktop config (claude_desktop_config.json):

{
  "mcpServers": {
    "redactai": {
      "command": "redactai",
      "args": ["mcp"]
    }
  }
}

API Server

Hosted (recommended for most users)

https://api.redactai.dev

Authenticated with X-API-Key header. Request a key from the maintainer. SSL-terminated, runs in eu-central-1, persistence via Supabase Postgres.

Endpoints:

  • POST /api/v1/analyze — detect PII entities
  • POST /api/v1/anonymize — anonymize text
  • POST /api/v1/anonymize/file — upload a file (txt, csv, pdf, docx)
  • POST /api/v1/anonymize/batch — batch multiple files
  • GET /api/v1/profiles — list built-in anonymization profiles
  • GET /api/v1/health — liveness check
  • GET /api/v1/health/ready — readiness + loaded entity types
  • Interactive docs: https://api.redactai.dev/docs

Self-hosted (local or your own infra)

# Start as background daemon (auto-starts on first CLI call)
redactai server start

# Or run in foreground
redactai server start --foreground

# Manage
redactai server status
redactai server restart
redactai server stop

The daemon exposes the same REST API at http://localhost:8000. Configure via .redactai.yml, env vars (API_KEYS, REDACTAI_DATABASE_URL), or Docker.

Persistence

Tokens and the audit log are stored in Postgres when REDACTAI_DATABASE_URL is set. Without it, both fall back to in-process memory (convenient for local dev and tests, but lost on restart and not safe across multiple instances).

# Supabase pooled connection (recommended for FastAPI)
export REDACTAI_DATABASE_URL="postgresql://postgres.PROJECT:PASSWORD@REGION.pooler.supabase.com:6543/postgres"

Schema is applied automatically on startup via CREATE TABLE IF NOT EXISTS. Raw tokens are never stored — only SHA-256 hashes and the first 12-character prefix. Revoked tokens are retained with a revoked_at timestamp for audit purposes.

Testing

The default test suite runs entirely in-memory (no Postgres required). Live-database smoke tests are marked @pytest.mark.postgres and auto-skip unless REDACTAI_DATABASE_URL is set:

# Default — in-memory only
pytest

# Include Postgres smoke tests (requires live DB)
set -a; source .env; set +a
pytest -m postgres

File Types

Supported: .txt, .csv, .pdf, .docx, .png, .jpg, .jpeg, .bmp, .tiff, .json

Image Redaction

Redact PII from images using OCR:

# Single image
redactai redact-image screenshot.png -o screenshot.redacted.png

# Batch directory
redactai redact-image ./screenshots -o ./screenshots.redacted

# Custom fill color
redactai redact-image photo.jpg --fill "255,192,203"

Structured Data

Anonymize PII in CSV and JSON files with column-aware detection:

# CSV
redactai structured data.csv -o data.clean.csv

# JSON
redactai structured data.json -o data.clean.json

# Custom strategy
redactai structured data.csv --strategy highest_confidence

Pseudonymization

Consistent fake↔real mappings across files and sessions. Same input → same output always.

# Pseudonymize with deterministic seed
redactai pseudonymize data.txt --seed "project-alpha" --store mappings.json

# Restore originals
redactai pseudonymize data.pseudonymized.txt --restore --store mappings.json

# Show mapping stats
redactai pseudonymize data.txt --show-mapping --seed "project-alpha"

Multi-Language Support

20+ languages with dedicated spaCy models:

# List all supported languages
redactai languages

# Use a specific language
redactai clean document.txt --language de   # German
redactai clean document.txt --language ja   # Japanese
redactai clean document.txt --language zh   # Chinese

PDF Annotation

Highlight PII in PDFs without destroying the original. Perfect for legal review and audit trails.

# Annotate with highlights
redactai annotate-pdf document.pdf -o document.annotated.pdf

# Use underline instead of highlight
redactai annotate-pdf document.pdf --type underline --color "0.0,0.0,1.0"

# Generate a PII report (JSON, CSV, or text)
redactai annotate-pdf document.pdf --report --report-format json

Evaluation

Benchmark detection quality against ground truth labels. Critical for audit evidence.

# Run evaluation against ground truth
redactai evaluate ground_truth.json -o report --format both

# Custom threshold and entities
redactai evaluate ground_truth.json --threshold 0.5 --entities PERSON,EMAIL_ADDRESS

Ground truth format (ground_truth.json):

[
  {
    "text": "My name is John Smith and email is john@example.com",
    "entities": [
      {"entity_type": "PERSON", "start": 11, "end": 21},
      {"entity_type": "EMAIL_ADDRESS", "start": 39, "end": 54}
    ]
  }
]

License

Apache 2.0 — see LICENSE.

Decision Trace

Explain exactly why each PII entity was detected — perfect for audit compliance and debugging.

# Show detailed decision trace
redactai trace --text "My name is John Smith and email is john@example.com"

# Trace from file
redactai trace --file document.txt --format json -o trace.json

# Show as markdown
redactai trace --file document.txt --format markdown

Streaming Processing

Real-time PII masking for logs, telemetry, and data pipelines.

# Process log file
redactai stream app.log -o app.masked.log

# Process stdin (pipe)
tail -f /var/log/app.log | redactai stream

# Custom entities
redactai stream app.log --entities EMAIL_ADDRESS,IP_ADDRESS

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

redactai-0.1.2.tar.gz (90.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

redactai-0.1.2-py3-none-any.whl (93.5 kB view details)

Uploaded Python 3

File details

Details for the file redactai-0.1.2.tar.gz.

File metadata

  • Download URL: redactai-0.1.2.tar.gz
  • Upload date:
  • Size: 90.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for redactai-0.1.2.tar.gz
Algorithm Hash digest
SHA256 6fa0a4fb28ebb30fd18d47ba5538173692fc7e2d3dea7b33ab8d521ec0d79d43
MD5 ab992aadb16f100e47cd5f9ede6486c6
BLAKE2b-256 e4ce9704c2d2fbb8732eff7afe2fe7b806f69c8b4b06bdb960bc2123504c8708

See more details on using hashes here.

Provenance

The following attestation bundles were made for redactai-0.1.2.tar.gz:

Publisher: release.yml on jagreehal/redactai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file redactai-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: redactai-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 93.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for redactai-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 4b9501531155d578d6abaf47fdd975b3affcbc9be7287614c0267ba8ff34972a
MD5 941149644596b5342e45d4a3413dc2ab
BLAKE2b-256 c519dd18981248949c0c0b369a353765835761d23cc5ee869cce0686f592f646

See more details on using hashes here.

Provenance

The following attestation bundles were made for redactai-0.1.2-py3-none-any.whl:

Publisher: release.yml on jagreehal/redactai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page