Drop-in redaction proxy for the Anthropic API — anonymize prompts, deanonymize responses, log the redacted form for compliance.

These details have not been verified by PyPI

Project description

claude-anonymizer

Drop-in redaction for the Anthropic API (and OpenAI, Gemini, anything that talks HTTPS). Anonymize prompts before they leave your perimeter, deanonymize responses before they reach the user, and keep a tamper-evident audit trail for compliance review.

user                proxy                       upstream LLM
 |  "fix mts auth"  →  |
 |                     |  "fix Acme auth"      →   |
 |                     |     (audit JSONL appended) |
 |                     | ←  "Acme uses OAuth"      |
 |                     |     (audit JSONL appended) |
 |  "mts uses OAuth" ← |                           |

Two surfaces:

Library — wrap any Python callable that talks to an LLM and the redaction round-trip happens inline.
System proxy — TLS-intercepting HTTPS proxy daemon you point your CLI tools at (HTTPS_PROXY=http://127.0.0.1:8080). Works with Claude Code, OpenCode, Codex CLI, Gemini CLI, plain curl, etc.

The base package is pure stdlib at runtime; the proxy daemon opts in to cryptography, mitmproxy, and pyyaml via the [proxy] extra.

Install

# Library only (pure stdlib, no extras)
pip install claude-anonymizer

# Library + proxy daemon
pip install 'claude-anonymizer[proxy]'

# Development
pip install -e '.[dev,proxy]'

For the full zero → Claude-Code-via-proxy walkthrough (install CA, start daemon, configure HTTPS_PROXY, verify redaction, inspect, uninstall), see INSTALL.md. For a one-shot bash bootstrap that automates every step, run ./install.sh.

Library quick start

Wrap an existing function

from claude_anonymizer import Anonymizer, wrap_callable

anon = Anonymizer()  # defaults: mts | MTS | МТС → Acme

def call_claude(prompt: str) -> SomeResult:
    # your existing function — must return an object with a `.output: str` attr
    ...

safe_call = wrap_callable(call_claude, anonymizer=anon)
result = safe_call("я из компании mts")
# result.output is deanonymized; logs show the anonymized round-trip

The wrapper handles sync and async callables, dataclasses (frozen or mutable) and plain classes. If the prompt contains no sensitive tokens, the original result object is returned unchanged (is-equality preserved).

Run the `claude` CLI through it

from claude_anonymizer import AnonymizingClaudeRunner

runner = AnonymizingClaudeRunner(model="claude-opus-4-7")
result = runner.run_sync("я из компании mts, как называется?")
print(result.output)            # "...МТС..."

Custom mappings + canonical form

from claude_anonymizer import Anonymizer

anon = Anonymizer(
    company_mappings={
        "mts": "Acme",
        "МТС": "Acme",
        "MTS": "Acme",
        "Internal-Project-Aurora": "Project-Y",
    },
    canonical_form="МТС",     # always restore to Russian uppercase
)

canonical_form collapses every original variant onto a single user-facing string when deanonymizing — useful when multiple inputs map to one placeholder upstream.

Proxy daemon

The system proxy intercepts HTTPS via a generated root CA, redacts outbound JSON bodies, restores inbound responses (buffered and streamed SSE), and writes a tamper-evident JSONL audit log.

One-shot install

pip install 'claude-anonymizer[proxy]'
anonymizer-proxy install-ca                    # generates CA, installs into OS trust store
anonymizer-proxy run                           # listens on 127.0.0.1:8080

Point your tool at the proxy:

export HTTPS_PROXY=http://127.0.0.1:8080
export SSL_CERT_FILE=$HOME/.compliance-proxy/ca/cert.pem

That's it. The first run writes a starter ~/.compliance-proxy/config.yaml you can customize.

Subcommands

Command	What it does
`anonymizer-proxy install-ca [--dry-run] [--force] [--name-constraints HOSTS]`	Generate root CA + register with OS trust store
`anonymizer-proxy uninstall-ca [--keep-files]`	Unregister + optionally delete the keypair
`anonymizer-proxy run [--config PATH] [--host HOST] [--port N] [--health-port N] [--fail-mode strict\|pass-through]`	Start the proxy daemon
`anonymizer-proxy reload [--sock PATH]`	Hot-reload config via UNIX socket (also accepts SIGHUP)
`anonymizer-proxy status [--config PATH] [--json]`	Show config + audit-log rollup + CA state
`anonymizer-proxy analyze [--audit PATH] [--config PATH] [--top N] [--json] [--include-redacted]`	Surface PII-shaped tokens the detector chain missed (audit-log discovery)

Observability

The proxy exposes two HTTP endpoints on the health port (default 8081):

curl -s http://127.0.0.1:8081/healthz                 # → {"status": "ok"}
curl -s http://127.0.0.1:8081/metrics                 # Prometheus exposition

Metric families:

compliance_proxy_redacted_total{category="..."} — counter, per category
compliance_proxy_redaction_latency_seconds_* — histogram (phase = redact)
compliance_proxy_active_flows — gauge
compliance_proxy_failures_total{reason="..."} — counter

Configuration

Full reference: docs/CONFIG.md. Minimal ~/.compliance-proxy/config.yaml:

listen:
  host: 127.0.0.1
  port: 8080
upstreams:
  - host: api.anthropic.com
  - host: api.openai.com
  - host: generativelanguage.googleapis.com
detectors:
  static_mapper:
    enabled: true
    mappings:
      mts: Acme
      MTS: Acme
      МТС: Acme
    canonical_form: МТС
  regex_matcher:
    enabled: true
    patterns: {}   # empty = all built-in Tier 1/2/3 defaults
audit:
  path: ~/.compliance-proxy/audit.jsonl
  rotation: daily
  retention_days: 90
policy:
  fail_mode: strict

A failed reload (broken YAML, unknown keys, bad enum value) logs ERROR and keeps the previously-loaded config — in-flight connections are never dropped.

Audit log

Every completed request lands as exactly one line in ~/.compliance-proxy/audit.YYYY-MM-DD.jsonl with:

request.match_counts — per-category counts only; never the original tokens
request.redacted_preview / response.raw_preview — first 200 bytes (post-redaction / pre-restore)
prev_hash + entry_hash — SHA-256 chain across records; tampering breaks the chain

Verify chain integrity offline:

from claude_anonymizer.proxy_server.audit import AuditWriter
AuditWriter.verify_chain(Path("~/.compliance-proxy/audit.2026-05-18.jsonl"))
# True | False

Files older than retention_days are deleted at file granularity (never line-by-line) on startup and after each rotation.

Deploying as a service

User-mode templates ship in deploy/:

deploy/launchd/com.compliance-proxy.plist — macOS ~/Library/LaunchAgents/
deploy/systemd/compliance-proxy.service — Linux ~/.config/systemd/user/

See deploy/README.md for per-OS install and the HTTPS_PROXY client setup.

Built-in detector tiers

Tier	Detector	Patterns / behaviour
1 (ПДн)	`regex_matcher`	MSISDN, passport, SNILS, INN, bank card (Luhn-validated), RS account, email
2 (КТ)	`regex_matcher`	Bearer token, JWT, API key (sk/pk/ghp/glpat/xox), password-in-URL, AWS access key, TUZ service account
3 (infra)	`regex_matcher`	`.mts-corp.ru`, `.mts.ru`, `10.` / `11.` IPs, Jira codes (EORD/CLBIZPL/EP/EINVY)
company	`static_mapper`	Exact-string substitution from YAML map
PII opt-in	`pii.RussianNameDetector`	Two/three-token Cyrillic name heuristic (disabled by default; ~12% FP rate; deny-list for known false-positives)

Add your own by implementing the Detector protocol — name, category, scan(text) -> list[Match].

Streaming (SSE)

Anthropic and OpenAI stream tokens via text/event-stream. The proxy detects this in responseheaders and installs a per-flow rolling-buffer rewriter — placeholders that straddle chunk boundaries are restored without buffering the full response. Algorithm: ARCHITECTURE.md §3.2.

Logging contract

The library emits these four INFO lines on every call — they are the GDPR audit artefact and wording is stable:

Log message (`claude_anonymizer.proxy`)	What it proves
`prompt anonymized: N -> M byte(s)`	The transform ran.
`anonymized prompt sent to API: …`	Exact bytes that left the perimeter (first 200).
`anonymized response from API: …`	Exact bytes that came back (first 200, pre-restore).
`response deanonymized: N -> M byte(s)`	The restore ran.

Together, the two … sent to API / … from API lines prove the wire never carried the canonical form.

Performance

Local benchmark on the reference dev laptop (M-series Mac, Python 3.10):

Prompt size	p50	p95	p99	Target
128 KB (~32k tokens), full detector chain	41 ms	43 ms	44 ms	≤ 50 ms

python bench/redactor_bench.py --iters 200

Tests

pytest -q                                          # full suite
pytest tests/proxy_server/test_audit.py            # one area
ruff check .                                       # lint
ruff format --check .                              # format

The proxy tests do not spawn the real claude CLI — they wire up a fake shell script as --claude-bin and assert argv shape, env discovery, and the full anonymize / spawn / deanonymize cycle. Streaming tests use synthesised Anthropic/OpenAI SSE fixtures.

CI matrix runs lint → tests (3.10, 3.11, 3.12) → bench → package build on every push and PR. See .github/workflows/ci.yml.

To run the same lint + format gates locally before every commit:

pip install pre-commit
pre-commit install      # one-time per clone
pre-commit run --all-files   # ad-hoc on the whole tree

The hooks pin the same ruff version as CI so a green pre-commit run will not be re-flagged in CI.

Documentation

Doc	Audience
docs/ARCHITECTURE.md	Engineering — design decisions, threat model, streaming algorithm
docs/CONFIG.md	Operators — every `config.yaml` key with validation rules
docs/PRD.md	Product — problem statement, success metrics, scope
docs/IMPLEMENTATION_PLAN.md	Engineering — phase-by-phase delivery plan
docs/VERIFICATION_PLAN.md	QA — test pyramid, CI gates, manual checklist
deploy/README.md	Operators — launchd / systemd install
docs/PYPI_RELEASE.md	Maintainers — PyPI trusted-publisher setup + release workflow

History

Originally extracted from whilly-orchestrator (JIRA-EORD-9843) and refactored to be orchestrator-agnostic. The proxy daemon was added in Phases 0–4 as documented in docs/IMPLEMENTATION_PLAN.md. See CHANGELOG.md for the per-release feature list.

Contributing

See CONTRIBUTING.md for the dev setup, the local gates contributors must run before pushing, and the architecture decisions that are load-bearing across versions.

Security

Please do not open a public issue for security problems. Follow the disclosure policy in SECURITY.md.

License

MIT.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.2.1

May 18, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

claude_anonymizer-0.2.1.tar.gz (79.1 kB view details)

Uploaded May 18, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

claude_anonymizer-0.2.1-py3-none-any.whl (86.4 kB view details)

Uploaded May 18, 2026 Python 3

File details

Details for the file claude_anonymizer-0.2.1.tar.gz.

File metadata

Download URL: claude_anonymizer-0.2.1.tar.gz
Upload date: May 18, 2026
Size: 79.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for claude_anonymizer-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`4df2df1b3956292dada82a52d7e81c1227cda3a1ebd2322d584cc93119f42167`
MD5	`30d810c34e5141b5858403dc0b862d04`
BLAKE2b-256	`ccaff501b68f55c25f87633fba897abf55881501ebac3c293fae0a854b257f97`

See more details on using hashes here.

Provenance

The following attestation bundles were made for claude_anonymizer-0.2.1.tar.gz:

Publisher: release.yml on mshegolev/claude-anonymizer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: claude_anonymizer-0.2.1.tar.gz
- Subject digest: 4df2df1b3956292dada82a52d7e81c1227cda3a1ebd2322d584cc93119f42167
- Sigstore transparency entry: 1568643879
- Sigstore integration time: May 18, 2026
Source repository:
- Permalink: mshegolev/claude-anonymizer@640997181fe37105ddd101bb2907e18764674ee3
- Branch / Tag: refs/tags/v0.2.1
- Owner: https://github.com/mshegolev
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@640997181fe37105ddd101bb2907e18764674ee3
- Trigger Event: push

File details

Details for the file claude_anonymizer-0.2.1-py3-none-any.whl.

File metadata

Download URL: claude_anonymizer-0.2.1-py3-none-any.whl
Upload date: May 18, 2026
Size: 86.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for claude_anonymizer-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2564d653abc9d53b60983318d854cb8db76ae72423f45e7c2070a781b638e57e`
MD5	`873b8cb24904562f3911cb6adb5927e3`
BLAKE2b-256	`2a9bbb4022fbbfab6bd958157d39fc30c4c9e2b5de1a0a36243c90c43153f084`

See more details on using hashes here.

Provenance

The following attestation bundles were made for claude_anonymizer-0.2.1-py3-none-any.whl:

Publisher: release.yml on mshegolev/claude-anonymizer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: claude_anonymizer-0.2.1-py3-none-any.whl
- Subject digest: 2564d653abc9d53b60983318d854cb8db76ae72423f45e7c2070a781b638e57e
- Sigstore transparency entry: 1568643895
- Sigstore integration time: May 18, 2026
Source repository:
- Permalink: mshegolev/claude-anonymizer@640997181fe37105ddd101bb2907e18764674ee3
- Branch / Tag: refs/tags/v0.2.1
- Owner: https://github.com/mshegolev
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@640997181fe37105ddd101bb2907e18764674ee3
- Trigger Event: push

claude-anonymizer 0.2.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

claude-anonymizer

Install

Library quick start

Wrap an existing function

Run the claude CLI through it

Custom mappings + canonical form

Proxy daemon

One-shot install

Subcommands

Observability

Configuration

Audit log

Deploying as a service

Built-in detector tiers

Streaming (SSE)

Logging contract

Performance

Tests

Documentation

History

Contributing

Security

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

Run the `claude` CLI through it