Drop-in redaction proxy for the Anthropic API — anonymize prompts, deanonymize responses, log the redacted form for compliance.
Project description
claude-anonymizer
Drop-in redaction for the Anthropic API (and OpenAI, Gemini, anything that talks HTTPS). Anonymize prompts before they leave your perimeter, deanonymize responses before they reach the user, and keep a tamper-evident audit trail for compliance review.
user proxy upstream LLM
| "fix mts auth" → |
| | "fix Acme auth" → |
| | (audit JSONL appended) |
| | ← "Acme uses OAuth" |
| | (audit JSONL appended) |
| "mts uses OAuth" ← | |
Two surfaces:
- Library — wrap any Python callable that talks to an LLM and the redaction round-trip happens inline.
- System proxy — TLS-intercepting HTTPS proxy daemon you point your
CLI tools at (
HTTPS_PROXY=http://127.0.0.1:8080). Works with Claude Code, OpenCode, Codex CLI, Gemini CLI, plaincurl, etc.
The base package is pure stdlib at runtime; the proxy daemon
opts in to cryptography, mitmproxy, and pyyaml via the [proxy] extra.
Install
# Library only (pure stdlib, no extras)
pip install claude-anonymizer
# Library + proxy daemon
pip install 'claude-anonymizer[proxy]'
# Development
pip install -e '.[dev,proxy]'
For the full zero → Claude-Code-via-proxy walkthrough (install CA,
start daemon, configure HTTPS_PROXY, verify redaction, inspect, uninstall),
see INSTALL.md. For a one-shot bash bootstrap that
automates every step, run ./install.sh.
Library quick start
Wrap an existing function
from claude_anonymizer import Anonymizer, wrap_callable
anon = Anonymizer() # defaults: mts | MTS | МТС → Acme
def call_claude(prompt: str) -> SomeResult:
# your existing function — must return an object with a `.output: str` attr
...
safe_call = wrap_callable(call_claude, anonymizer=anon)
result = safe_call("я из компании mts")
# result.output is deanonymized; logs show the anonymized round-trip
The wrapper handles sync and async callables, dataclasses (frozen or
mutable) and plain classes. If the prompt contains no sensitive tokens,
the original result object is returned unchanged (is-equality preserved).
Run the claude CLI through it
from claude_anonymizer import AnonymizingClaudeRunner
runner = AnonymizingClaudeRunner(model="claude-opus-4-7")
result = runner.run_sync("я из компании mts, как называется?")
print(result.output) # "...МТС..."
Custom mappings + canonical form
from claude_anonymizer import Anonymizer
anon = Anonymizer(
company_mappings={
"mts": "Acme",
"МТС": "Acme",
"MTS": "Acme",
"Internal-Project-Aurora": "Project-Y",
},
canonical_form="МТС", # always restore to Russian uppercase
)
canonical_form collapses every original variant onto a single
user-facing string when deanonymizing — useful when multiple inputs
map to one placeholder upstream.
Proxy daemon
The system proxy intercepts HTTPS via a generated root CA, redacts outbound JSON bodies, restores inbound responses (buffered and streamed SSE), and writes a tamper-evident JSONL audit log.
One-shot install
pip install 'claude-anonymizer[proxy]'
anonymizer-proxy install-ca # generates CA, installs into OS trust store
anonymizer-proxy run # listens on 127.0.0.1:8080
Point your tool at the proxy:
export HTTPS_PROXY=http://127.0.0.1:8080
export SSL_CERT_FILE=$HOME/.compliance-proxy/ca/cert.pem
That's it. The first run writes a starter ~/.compliance-proxy/config.yaml
you can customize.
Subcommands
| Command | What it does |
|---|---|
anonymizer-proxy install-ca [--dry-run] [--force] [--name-constraints HOSTS] |
Generate root CA + register with OS trust store |
anonymizer-proxy uninstall-ca [--keep-files] |
Unregister + optionally delete the keypair |
anonymizer-proxy run [--config PATH] [--host HOST] [--port N] [--health-port N] [--fail-mode strict|pass-through] |
Start the proxy daemon |
anonymizer-proxy reload [--sock PATH] |
Hot-reload config via UNIX socket (also accepts SIGHUP) |
anonymizer-proxy status [--config PATH] [--json] |
Show config + audit-log rollup + CA state |
anonymizer-proxy analyze [--audit PATH] [--config PATH] [--top N] [--json] [--include-redacted] |
Surface PII-shaped tokens the detector chain missed (audit-log discovery) |
Observability
The proxy exposes two HTTP endpoints on the health port (default 8081):
curl -s http://127.0.0.1:8081/healthz # → {"status": "ok"}
curl -s http://127.0.0.1:8081/metrics # Prometheus exposition
Metric families:
compliance_proxy_redacted_total{category="..."}— counter, per categorycompliance_proxy_redaction_latency_seconds_*— histogram (phase =redact)compliance_proxy_active_flows— gaugecompliance_proxy_failures_total{reason="..."}— counter
Configuration
Full reference: docs/CONFIG.md. Minimal ~/.compliance-proxy/config.yaml:
listen:
host: 127.0.0.1
port: 8080
upstreams:
- host: api.anthropic.com
- host: api.openai.com
- host: generativelanguage.googleapis.com
detectors:
static_mapper:
enabled: true
mappings:
mts: Acme
MTS: Acme
МТС: Acme
canonical_form: МТС
regex_matcher:
enabled: true
patterns: {} # empty = all built-in Tier 1/2/3 defaults
audit:
path: ~/.compliance-proxy/audit.jsonl
rotation: daily
retention_days: 90
policy:
fail_mode: strict
A failed reload (broken YAML, unknown keys, bad enum value) logs ERROR and keeps the previously-loaded config — in-flight connections are never dropped.
Audit log
Every completed request lands as exactly one line in
~/.compliance-proxy/audit.YYYY-MM-DD.jsonl with:
request.match_counts— per-category counts only; never the original tokensrequest.redacted_preview/response.raw_preview— first 200 bytes (post-redaction / pre-restore)prev_hash+entry_hash— SHA-256 chain across records; tampering breaks the chain
Verify chain integrity offline:
from claude_anonymizer.proxy_server.audit import AuditWriter
AuditWriter.verify_chain(Path("~/.compliance-proxy/audit.2026-05-18.jsonl"))
# True | False
Files older than retention_days are deleted at file granularity (never
line-by-line) on startup and after each rotation.
Deploying as a service
User-mode templates ship in deploy/:
deploy/launchd/com.compliance-proxy.plist— macOS~/Library/LaunchAgents/deploy/systemd/compliance-proxy.service— Linux~/.config/systemd/user/
See deploy/README.md for per-OS install and the HTTPS_PROXY client setup.
Built-in detector tiers
| Tier | Detector | Patterns / behaviour |
|---|---|---|
| 1 (ПДн) | regex_matcher |
MSISDN, passport, SNILS, INN, bank card (Luhn-validated), RS account, email |
| 2 (КТ) | regex_matcher |
Bearer token, JWT, API key (sk/pk/ghp/glpat/xox), password-in-URL, AWS access key, TUZ service account |
| 3 (infra) | regex_matcher |
*.mts-corp.ru, *.mts.ru, 10.* / 11.* IPs, Jira codes (EORD/CLBIZPL/EP/EINVY) |
| company | static_mapper |
Exact-string substitution from YAML map |
| PII opt-in | pii.RussianNameDetector |
Two/three-token Cyrillic name heuristic (disabled by default; ~12% FP rate; deny-list for known false-positives) |
Add your own by implementing the Detector
protocol — name, category, scan(text) -> list[Match].
Streaming (SSE)
Anthropic and OpenAI stream tokens via text/event-stream. The proxy
detects this in responseheaders and installs a per-flow rolling-buffer
rewriter — placeholders that straddle chunk boundaries are restored
without buffering the full response. Algorithm: ARCHITECTURE.md §3.2.
Logging contract
The library emits these four INFO lines on every call — they are the GDPR audit artefact and wording is stable:
Log message (claude_anonymizer.proxy) |
What it proves |
|---|---|
prompt anonymized: N -> M byte(s) |
The transform ran. |
anonymized prompt sent to API: … |
Exact bytes that left the perimeter (first 200). |
anonymized response from API: … |
Exact bytes that came back (first 200, pre-restore). |
response deanonymized: N -> M byte(s) |
The restore ran. |
Together, the two … sent to API / … from API lines prove the wire
never carried the canonical form.
Performance
Local benchmark on the reference dev laptop (M-series Mac, Python 3.10):
| Prompt size | p50 | p95 | p99 | Target |
|---|---|---|---|---|
| 128 KB (~32k tokens), full detector chain | 41 ms | 43 ms | 44 ms | ≤ 50 ms |
python bench/redactor_bench.py --iters 200
Tests
pytest -q # full suite
pytest tests/proxy_server/test_audit.py # one area
ruff check . # lint
ruff format --check . # format
The proxy tests do not spawn the real claude CLI — they wire up
a fake shell script as --claude-bin and assert argv shape, env
discovery, and the full anonymize / spawn / deanonymize cycle. Streaming
tests use synthesised Anthropic/OpenAI SSE fixtures.
CI matrix runs lint → tests (3.10, 3.11, 3.12) → bench → package build
on every push and PR. See .github/workflows/ci.yml.
To run the same lint + format gates locally before every commit:
pip install pre-commit
pre-commit install # one-time per clone
pre-commit run --all-files # ad-hoc on the whole tree
The hooks pin the same ruff version as CI so a green pre-commit run
will not be re-flagged in CI.
Documentation
| Doc | Audience |
|---|---|
| docs/ARCHITECTURE.md | Engineering — design decisions, threat model, streaming algorithm |
| docs/CONFIG.md | Operators — every config.yaml key with validation rules |
| docs/PRD.md | Product — problem statement, success metrics, scope |
| docs/IMPLEMENTATION_PLAN.md | Engineering — phase-by-phase delivery plan |
| docs/VERIFICATION_PLAN.md | QA — test pyramid, CI gates, manual checklist |
| deploy/README.md | Operators — launchd / systemd install |
| docs/PYPI_RELEASE.md | Maintainers — PyPI trusted-publisher setup + release workflow |
History
Originally extracted from whilly-orchestrator (JIRA-EORD-9843) and refactored to be orchestrator-agnostic. The proxy daemon was added in Phases 0–4 as documented in docs/IMPLEMENTATION_PLAN.md. See CHANGELOG.md for the per-release feature list.
Contributing
See CONTRIBUTING.md for the dev setup, the local gates contributors must run before pushing, and the architecture decisions that are load-bearing across versions.
Security
Please do not open a public issue for security problems. Follow the disclosure policy in SECURITY.md.
License
MIT.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file claude_anonymizer-0.2.1.tar.gz.
File metadata
- Download URL: claude_anonymizer-0.2.1.tar.gz
- Upload date:
- Size: 79.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4df2df1b3956292dada82a52d7e81c1227cda3a1ebd2322d584cc93119f42167
|
|
| MD5 |
30d810c34e5141b5858403dc0b862d04
|
|
| BLAKE2b-256 |
ccaff501b68f55c25f87633fba897abf55881501ebac3c293fae0a854b257f97
|
Provenance
The following attestation bundles were made for claude_anonymizer-0.2.1.tar.gz:
Publisher:
release.yml on mshegolev/claude-anonymizer
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
claude_anonymizer-0.2.1.tar.gz -
Subject digest:
4df2df1b3956292dada82a52d7e81c1227cda3a1ebd2322d584cc93119f42167 - Sigstore transparency entry: 1568643879
- Sigstore integration time:
-
Permalink:
mshegolev/claude-anonymizer@640997181fe37105ddd101bb2907e18764674ee3 -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/mshegolev
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@640997181fe37105ddd101bb2907e18764674ee3 -
Trigger Event:
push
-
Statement type:
File details
Details for the file claude_anonymizer-0.2.1-py3-none-any.whl.
File metadata
- Download URL: claude_anonymizer-0.2.1-py3-none-any.whl
- Upload date:
- Size: 86.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2564d653abc9d53b60983318d854cb8db76ae72423f45e7c2070a781b638e57e
|
|
| MD5 |
873b8cb24904562f3911cb6adb5927e3
|
|
| BLAKE2b-256 |
2a9bbb4022fbbfab6bd958157d39fc30c4c9e2b5de1a0a36243c90c43153f084
|
Provenance
The following attestation bundles were made for claude_anonymizer-0.2.1-py3-none-any.whl:
Publisher:
release.yml on mshegolev/claude-anonymizer
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
claude_anonymizer-0.2.1-py3-none-any.whl -
Subject digest:
2564d653abc9d53b60983318d854cb8db76ae72423f45e7c2070a781b638e57e - Sigstore transparency entry: 1568643895
- Sigstore integration time:
-
Permalink:
mshegolev/claude-anonymizer@640997181fe37105ddd101bb2907e18764674ee3 -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/mshegolev
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@640997181fe37105ddd101bb2907e18764674ee3 -
Trigger Event:
push
-
Statement type: