Skip to main content

Drop-in redaction proxy for the Anthropic API — anonymize prompts, deanonymize responses, log the redacted form for compliance.

Project description

claude-anonymizer

ci python license

Drop-in redaction for the Anthropic API (and OpenAI, Gemini, anything that talks HTTPS). Anonymize prompts before they leave your perimeter, deanonymize responses before they reach the user, and keep a tamper-evident audit trail for compliance review.

user                proxy                       upstream LLM
 |  "fix mts auth"  →  |
 |                     |  "fix Acme auth"      →   |
 |                     |     (audit JSONL appended) |
 |                     | ←  "Acme uses OAuth"      |
 |                     |     (audit JSONL appended) |
 |  "mts uses OAuth" ← |                           |

Two surfaces:

  1. Library — wrap any Python callable that talks to an LLM and the redaction round-trip happens inline.
  2. System proxy — TLS-intercepting HTTPS proxy daemon you point your CLI tools at (HTTPS_PROXY=http://127.0.0.1:8080). Works with Claude Code, OpenCode, Codex CLI, Gemini CLI, plain curl, etc.

The base package is pure stdlib at runtime; the proxy daemon opts in to cryptography, mitmproxy, and pyyaml via the [proxy] extra.


Install

# Library only (pure stdlib, no extras)
pip install claude-anonymizer

# Library + proxy daemon
pip install 'claude-anonymizer[proxy]'

# Development
pip install -e '.[dev,proxy]'

For the full zero → Claude-Code-via-proxy walkthrough (install CA, start daemon, configure HTTPS_PROXY, verify redaction, inspect, uninstall), see INSTALL.md. For a one-shot bash bootstrap that automates every step, run ./install.sh.


Library quick start

Wrap an existing function

from claude_anonymizer import Anonymizer, wrap_callable

anon = Anonymizer()  # defaults: mts | MTS | МТС → Acme

def call_claude(prompt: str) -> SomeResult:
    # your existing function — must return an object with a `.output: str` attr
    ...

safe_call = wrap_callable(call_claude, anonymizer=anon)
result = safe_call("я из компании mts")
# result.output is deanonymized; logs show the anonymized round-trip

The wrapper handles sync and async callables, dataclasses (frozen or mutable) and plain classes. If the prompt contains no sensitive tokens, the original result object is returned unchanged (is-equality preserved).

Run the claude CLI through it

from claude_anonymizer import AnonymizingClaudeRunner

runner = AnonymizingClaudeRunner(model="claude-opus-4-7")
result = runner.run_sync("я из компании mts, как называется?")
print(result.output)            # "...МТС..."

Custom mappings + canonical form

from claude_anonymizer import Anonymizer

anon = Anonymizer(
    company_mappings={
        "mts": "Acme",
        "МТС": "Acme",
        "MTS": "Acme",
        "Internal-Project-Aurora": "Project-Y",
    },
    canonical_form="МТС",     # always restore to Russian uppercase
)

canonical_form collapses every original variant onto a single user-facing string when deanonymizing — useful when multiple inputs map to one placeholder upstream.


Proxy daemon

The system proxy intercepts HTTPS via a generated root CA, redacts outbound JSON bodies, restores inbound responses (buffered and streamed SSE), and writes a tamper-evident JSONL audit log.

One-shot install

pip install 'claude-anonymizer[proxy]'
anonymizer-proxy install-ca                    # generates CA, installs into OS trust store
anonymizer-proxy run                           # listens on 127.0.0.1:8080

Point your tool at the proxy:

export HTTPS_PROXY=http://127.0.0.1:8080
export SSL_CERT_FILE=$HOME/.compliance-proxy/ca/cert.pem

That's it. The first run writes a starter ~/.compliance-proxy/config.yaml you can customize.

Subcommands

Command What it does
anonymizer-proxy install-ca [--dry-run] [--force] [--name-constraints HOSTS] Generate root CA + register with OS trust store
anonymizer-proxy uninstall-ca [--keep-files] Unregister + optionally delete the keypair
anonymizer-proxy run [--config PATH] [--host HOST] [--port N] [--health-port N] [--fail-mode strict|pass-through] Start the proxy daemon
anonymizer-proxy reload [--sock PATH] Hot-reload config via UNIX socket (also accepts SIGHUP)
anonymizer-proxy status [--config PATH] [--json] Show config + audit-log rollup + CA state
anonymizer-proxy analyze [--audit PATH] [--config PATH] [--top N] [--json] [--include-redacted] Surface PII-shaped tokens the detector chain missed (audit-log discovery)

Observability

The proxy exposes two HTTP endpoints on the health port (default 8081):

curl -s http://127.0.0.1:8081/healthz                 # → {"status": "ok"}
curl -s http://127.0.0.1:8081/metrics                 # Prometheus exposition

Metric families:

  • compliance_proxy_redacted_total{category="..."} — counter, per category
  • compliance_proxy_redaction_latency_seconds_* — histogram (phase = redact)
  • compliance_proxy_active_flows — gauge
  • compliance_proxy_failures_total{reason="..."} — counter

Configuration

Full reference: docs/CONFIG.md. Minimal ~/.compliance-proxy/config.yaml:

listen:
  host: 127.0.0.1
  port: 8080
upstreams:
  - host: api.anthropic.com
  - host: api.openai.com
  - host: generativelanguage.googleapis.com
detectors:
  static_mapper:
    enabled: true
    mappings:
      mts: Acme
      MTS: Acme
      МТС: Acme
    canonical_form: МТС
  regex_matcher:
    enabled: true
    patterns: {}   # empty = all built-in Tier 1/2/3 defaults
audit:
  path: ~/.compliance-proxy/audit.jsonl
  rotation: daily
  retention_days: 90
policy:
  fail_mode: strict

A failed reload (broken YAML, unknown keys, bad enum value) logs ERROR and keeps the previously-loaded config — in-flight connections are never dropped.

Audit log

Every completed request lands as exactly one line in ~/.compliance-proxy/audit.YYYY-MM-DD.jsonl with:

  • request.match_counts — per-category counts only; never the original tokens
  • request.redacted_preview / response.raw_preview — first 200 bytes (post-redaction / pre-restore)
  • prev_hash + entry_hash — SHA-256 chain across records; tampering breaks the chain

Verify chain integrity offline:

from claude_anonymizer.proxy_server.audit import AuditWriter
AuditWriter.verify_chain(Path("~/.compliance-proxy/audit.2026-05-18.jsonl"))
# True | False

Files older than retention_days are deleted at file granularity (never line-by-line) on startup and after each rotation.

Deploying as a service

User-mode templates ship in deploy/:

  • deploy/launchd/com.compliance-proxy.plist — macOS ~/Library/LaunchAgents/
  • deploy/systemd/compliance-proxy.service — Linux ~/.config/systemd/user/

See deploy/README.md for per-OS install and the HTTPS_PROXY client setup.

Built-in detector tiers

Tier Detector Patterns / behaviour
1 (ПДн) regex_matcher MSISDN, passport, SNILS, INN, bank card (Luhn-validated), RS account, email
2 (КТ) regex_matcher Bearer token, JWT, API key (sk/pk/ghp/glpat/xox), password-in-URL, AWS access key, TUZ service account
3 (infra) regex_matcher *.mts-corp.ru, *.mts.ru, 10.* / 11.* IPs, Jira codes (EORD/CLBIZPL/EP/EINVY)
company static_mapper Exact-string substitution from YAML map
PII opt-in pii.RussianNameDetector Two/three-token Cyrillic name heuristic (disabled by default; ~12% FP rate; deny-list for known false-positives)

Add your own by implementing the Detector protocol — name, category, scan(text) -> list[Match].

Streaming (SSE)

Anthropic and OpenAI stream tokens via text/event-stream. The proxy detects this in responseheaders and installs a per-flow rolling-buffer rewriter — placeholders that straddle chunk boundaries are restored without buffering the full response. Algorithm: ARCHITECTURE.md §3.2.


Logging contract

The library emits these four INFO lines on every call — they are the GDPR audit artefact and wording is stable:

Log message (claude_anonymizer.proxy) What it proves
prompt anonymized: N -> M byte(s) The transform ran.
anonymized prompt sent to API: … Exact bytes that left the perimeter (first 200).
anonymized response from API: … Exact bytes that came back (first 200, pre-restore).
response deanonymized: N -> M byte(s) The restore ran.

Together, the two … sent to API / … from API lines prove the wire never carried the canonical form.


Performance

Local benchmark on the reference dev laptop (M-series Mac, Python 3.10):

Prompt size p50 p95 p99 Target
128 KB (~32k tokens), full detector chain 41 ms 43 ms 44 ms ≤ 50 ms
python bench/redactor_bench.py --iters 200

Tests

pytest -q                                          # full suite
pytest tests/proxy_server/test_audit.py            # one area
ruff check .                                       # lint
ruff format --check .                              # format

The proxy tests do not spawn the real claude CLI — they wire up a fake shell script as --claude-bin and assert argv shape, env discovery, and the full anonymize / spawn / deanonymize cycle. Streaming tests use synthesised Anthropic/OpenAI SSE fixtures.

CI matrix runs lint → tests (3.10, 3.11, 3.12) → bench → package build on every push and PR. See .github/workflows/ci.yml.

To run the same lint + format gates locally before every commit:

pip install pre-commit
pre-commit install      # one-time per clone
pre-commit run --all-files   # ad-hoc on the whole tree

The hooks pin the same ruff version as CI so a green pre-commit run will not be re-flagged in CI.


Documentation

Doc Audience
docs/ARCHITECTURE.md Engineering — design decisions, threat model, streaming algorithm
docs/CONFIG.md Operators — every config.yaml key with validation rules
docs/PRD.md Product — problem statement, success metrics, scope
docs/IMPLEMENTATION_PLAN.md Engineering — phase-by-phase delivery plan
docs/VERIFICATION_PLAN.md QA — test pyramid, CI gates, manual checklist
deploy/README.md Operators — launchd / systemd install
docs/PYPI_RELEASE.md Maintainers — PyPI trusted-publisher setup + release workflow

History

Originally extracted from whilly-orchestrator (JIRA-EORD-9843) and refactored to be orchestrator-agnostic. The proxy daemon was added in Phases 0–4 as documented in docs/IMPLEMENTATION_PLAN.md. See CHANGELOG.md for the per-release feature list.

Contributing

See CONTRIBUTING.md for the dev setup, the local gates contributors must run before pushing, and the architecture decisions that are load-bearing across versions.

Security

Please do not open a public issue for security problems. Follow the disclosure policy in SECURITY.md.

License

MIT.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

claude_anonymizer-0.2.1.tar.gz (79.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

claude_anonymizer-0.2.1-py3-none-any.whl (86.4 kB view details)

Uploaded Python 3

File details

Details for the file claude_anonymizer-0.2.1.tar.gz.

File metadata

  • Download URL: claude_anonymizer-0.2.1.tar.gz
  • Upload date:
  • Size: 79.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for claude_anonymizer-0.2.1.tar.gz
Algorithm Hash digest
SHA256 4df2df1b3956292dada82a52d7e81c1227cda3a1ebd2322d584cc93119f42167
MD5 30d810c34e5141b5858403dc0b862d04
BLAKE2b-256 ccaff501b68f55c25f87633fba897abf55881501ebac3c293fae0a854b257f97

See more details on using hashes here.

Provenance

The following attestation bundles were made for claude_anonymizer-0.2.1.tar.gz:

Publisher: release.yml on mshegolev/claude-anonymizer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file claude_anonymizer-0.2.1-py3-none-any.whl.

File metadata

File hashes

Hashes for claude_anonymizer-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2564d653abc9d53b60983318d854cb8db76ae72423f45e7c2070a781b638e57e
MD5 873b8cb24904562f3911cb6adb5927e3
BLAKE2b-256 2a9bbb4022fbbfab6bd958157d39fc30c4c9e2b5de1a0a36243c90c43153f084

See more details on using hashes here.

Provenance

The following attestation bundles were made for claude_anonymizer-0.2.1-py3-none-any.whl:

Publisher: release.yml on mshegolev/claude-anonymizer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page