Local PII firewall for AI CLI tools. Tokenize before it leaves your machine.

These details have not been verified by PyPI

Project links

Project description

pii-guard

Local PII firewall for AI coding tools. Tokenize before it leaves your machine.

When you ask any AI tool — Claude Code, Cursor, Aider, Codex, Continue.dev — to analyse data, raw PII travels to their servers. pii-guard intercepts it first: replaces real values with consistent tokens ([AADHAAR_1], [EMAIL_2]), lets the AI work on the safe version, and reverses it when you're done. The mapping key never leaves your machine.

Works with every AI tool

Tool	How
Claude Code	PostToolUse hooks — automatic, zero-touch per file read
Cursor	Set `OPENAI_BASE_URL=http://localhost:8111/openai/v1`
Aider	Set `OPENAI_API_BASE=http://localhost:8111/openai/v1`
OpenAI Codex CLI	Set `OPENAI_BASE_URL=http://localhost:8111/openai/v1`
Continue.dev	Set `apiBase` in `~/.continue/config.json`
Any OpenAI-SDK app	Set `OPENAI_BASE_URL` — no code changes
Any Anthropic-SDK app	Set `ANTHROPIC_BASE_URL` — no code changes
Any tool, any LLM	Manually: `pii-guard tokenize file.csv` before sharing

Integration guides: integrations/

How it works — three modes

┌─────────────────────────────────────────────────────────────────────┐
│  Mode 1 · CLI  (any tool, manual)                                   │
│  pii-guard tokenize file.csv → safe file → AI analyses → detokenize │
├─────────────────────────────────────────────────────────────────────┤
│  Mode 2 · Claude Code hooks  (automatic, zero-touch)                │
│  pii-guard install-hooks → hooks fire on every Read + Bash output   │
│  Claude never sees raw PII in the session                           │
├─────────────────────────────────────────────────────────────────────┤
│  Mode 3 · API proxy  (any OpenAI/Anthropic-compatible tool)         │
│  pii-guard proxy → sits between your tool and the upstream API      │
│  One env var. Zero code changes. Works with Cursor, Aider, Codex,  │
│  Continue.dev, LangChain, and any SDK that respects base URL vars.  │
└─────────────────────────────────────────────────────────────────────┘

All three modes use the same tokenization engine and session format. john@acme.com is always [EMAIL_1] within a session, regardless of which mode captured it.

Install

pip install piiwall            # core (plain text, CSV)
pip install 'piiwall[rich]'    # + PDF, Word (.docx), Excel (.xlsx)

Mode 1 — CLI (tool-agnostic, manual)

Works with any AI tool. Tokenize a file first, share the safe version, detokenize results when done.

# Scan — see what PII exists (exits 1 if found)
pii-guard scan customers.csv --show-values

# Tokenize — create customers.safe.csv with tokens
pii-guard tokenize customers.csv -p dpdp

# Analyse customers.safe.csv with whatever AI tool you use
# Then restore real values
pii-guard detokenize result.txt --session ~/.pii-guard/sessions/pii-guard-<timestamp>.json

Supported file formats

Format	Scan	Tokenize	Notes
Plain text, CSV, JSON	✓	✓	Core, no extra deps
PDF (`.pdf`)	✓	✓	Output as `.safe.txt`; requires `pii-guard[rich]`
Word (`.docx`)	✓	✓	Format preserved, paragraphs and tables tokenized in-place; requires `pii-guard[rich]`
Excel (`.xlsx`)	✓	✓	Format preserved, all string cells tokenized in-place; requires `pii-guard[rich]`

pip install 'piiwall[rich]'                 # install format support
pii-guard scan report.docx -p dpdp            # scan a Word doc
pii-guard tokenize customer_data.xlsx -p dpdp # tokenize an Excel sheet → customer_data.safe.xlsx
pii-guard scan employees.pdf -p hipaa         # scan a PDF

Session stats

pii-guard stats ~/.pii-guard/sessions/pii-guard-<timestamp>.json

Session:  pii-guard-20240115-103000.json
Total tokens: 12

  Type                    Count
  ---------------------- ------
  EMAIL                       4
  AADHAAR                     3
  MOBILE_IN                   3
  PAN                         2

Export session as CSV (for Excel / VLOOKUP)

pii-guard export-session ~/.pii-guard/sessions/pii-guard-<timestamp>.json

Output (pii-guard-<timestamp>_mapping.csv):

token,pii_type,original_value
[EMAIL_1],EMAIL,john@acme.com
[EMAIL_2],EMAIL,jane@acme.com
[AADHAAR_1],AADHAAR,2345 6789 0123
[PAN_1],PAN,ABCDE1234F

Presets

Preset	Covers
`dpdp`	🇮🇳 Aadhaar, PAN, Voter ID, Passport, IFSC, GSTIN, UPI VPA, mobile, PIN code
`gdpr`	🇪🇺 IBAN, BIC/SWIFT, VAT, EU phone, MAC address, GPS coordinates
`hipaa`	🇺🇸 SSN, NPI, DEA, MRN, health plan IDs, US phone, US dates
`pci`	💳 Visa, Mastercard, Amex, Discover, Rupay, CVV, card expiry

pii-guard tokenize file.csv -p dpdp -p pci   # combine presets
pii-guard config show-patterns dpdp           # inspect patterns in a preset

Mode 2 — Claude Code hooks (automatic, zero-touch)

One command installs hooks that fire on every file Claude reads and every bash command output:

pip install piiwall
pii-guard install-hooks --global

This writes two PostToolUse hooks into ~/.claude/settings.json. Claude never sees raw PII in any session.

Add the behavioral layer (tells Claude to proactively offer tokenization):

cp integrations/CLAUDE.md ~/.claude/CLAUDE.md

What the hooks do

Claude calls Read("customers.csv")
        ↓
post_read.py intercepts the tool response
        ↓
Scans for PII → finds 20 instances
        ↓
Replaces with tokens, saves session key → ~/.pii-guard/sessions/claude-<session-id>.json
        ↓
Claude sees [EMAIL_1], [AADHAAR_1] — never the real values

All Read and Bash calls in one Claude Code session share one session file. One detokenize pass restores everything.

Restore after Claude session

pii-guard detokenize result.txt --session ~/.pii-guard/sessions/claude-<session-id>.json
# or export as CSV
pii-guard export-session ~/.pii-guard/sessions/claude-<session-id>.json

Control via environment variables

export PII_GUARD_PRESETS=dpdp,pci   # comma-separated presets (default: dpdp)
export PII_GUARD_ENABLED=0          # disable hooks without removing them
export PII_GUARD_MAX_CHARS=200000   # cap bash output scan size (default: 200000)

Mode 3 — API proxy (Cursor, Aider, Codex, Continue.dev, any SDK)

The proxy sits between your tool and the upstream API. It tokenizes every outgoing prompt and detokenizes every response. Your tool and your code are unchanged.

pii-guard proxy --port 8111 --preset dpdp

Set the base URL in your tool

# Anthropic SDK / Claude Code / any Anthropic-compatible tool
export ANTHROPIC_BASE_URL=http://localhost:8111

# OpenAI SDK / Cursor / Aider / Codex CLI / Continue.dev / LangChain
export OPENAI_BASE_URL=http://localhost:8111/openai/v1

Your existing code works unchanged:

import anthropic
client = anthropic.Anthropic()   # routes through pii-guard automatically

response = client.messages.create(
    model="claude-sonnet-4-6",
    messages=[{"role": "user", "content": "Analyse rajesh@gmail.com, Aadhaar 2345 6789 0123"}]
)
# Anthropic receives: "Analyse [EMAIL_1], Aadhaar [AADHAAR_1]"
# Your app receives:  "Analyse rajesh@gmail.com, Aadhaar 2345 6789 0123"

What the proxy does

Your tool sends prompt with real PII
        ↓
pii-guard proxy on localhost:8111
        ↓
Tokenizes PII → [EMAIL_1], [AADHAAR_1], [PAN_1]
        ↓
Forwards to api.anthropic.com or api.openai.com
        ↓
Gets response with tokens
        ↓
Detokenizes → real values restored
        ↓
Your tool receives response with real values

Anthropic and OpenAI never see the real data.

Proxy options

pii-guard proxy --port 8111                        # default port
pii-guard proxy --preset dpdp,pci                  # multiple presets
pii-guard proxy --pattern "CUST_ID:CUST-\d{6}"    # custom pattern
pii-guard proxy --session session.json             # resume existing session
pii-guard proxy --quiet                            # suppress per-request logs

Restore after proxy session

pii-guard export-session ~/.pii-guard/sessions/<session-id>.json
pii-guard detokenize output.txt --session ~/.pii-guard/sessions/<session-id>.json

Per-tool guides

Custom patterns

Persistent — `~/.pii-guard/config.yaml`

Loaded automatically by the CLI, hooks, and proxy:

custom_patterns:
  CUSTOMER_ID: 'CUST-\d{6}'
  EMPLOYEE_ID: 'EMP\d{5}'
  INTERNAL_REF: 'INT-[A-Z]{3}-\d{4}'

mkdir -p ~/.pii-guard
cp config/pii-guard.example.yaml ~/.pii-guard/config.yaml

Inline — `--pattern` / `-P` flag

pii-guard scan file.csv -P "CUSTOMER_ID:CUST-\d{6}" --show-values
pii-guard tokenize file.csv -P "CUSTOMER_ID:CUST-\d{6}" -P "EMPLOYEE_ID:EMP\d{5}"
pii-guard tokenize data.csv -p dpdp -p pci -P "ACCOUNT_REF:ACC-\d{8}"

CUST-123456 becomes [CUSTOMER_ID_1], fully reversible.

Use from Python

from pii_guard.presets import load_presets
from pii_guard.scanner.engine import Scanner
from pii_guard.scanner.patterns import BASE_PATTERNS
from pii_guard.tokenizer.engine import tokenize
from pii_guard.tokenizer.session import Session

patterns = {**BASE_PATTERNS, **load_presets(["dpdp"])}
scanner = Scanner(patterns)
session = Session.new()

safe_text, matches = tokenize(raw_text, scanner, session)
session.save()

print(f"Tokenized {len(matches)} PII instances.")
print(f"Session key: {session.path}")

How tokenization works

Same value → same token within a session. Different values → different tokens. Fully reversible.

john@acme.com   →  [EMAIL_1]     (always, within this session)
jane@acme.com   →  [EMAIL_2]
john@acme.com   →  [EMAIL_1]     ← same input, same token
2345 6789 0123  →  [AADHAAR_1]

Session key stays in ~/.pii-guard/sessions/. Never sent anywhere.

Limitations

Regex-based detection — structured formats (Aadhaar, PAN, IBAN, SSN) have near-zero false negatives. Free-form PII (names, addresses in prose) is not detected; combine with a dedicated NER model if needed.
DOCX formatting in PII-containing paragraphs — when a PII value spans multiple runs in a Word document (e.g., bold text adjacent to the value), the paragraph is collapsed to a single run after tokenization. Paragraphs with no PII are untouched.
Same-session tokens only — tokens from one session cannot be detokenized with a different session key. Keep the session file for as long as you need to reverse.
Streaming responses — the proxy detokenizes SSE streams line-by-line. A token that spans two SSE chunks will not be restored; rare but possible with large token strings.
Proxy is localhost-only — binds to 127.0.0.1. Not designed to be network-exposed. Treat the session key file as a secret.
No key management — session files are plain JSON on disk. Encrypt or delete when no longer needed.

CI/CD integration

GitHub Actions

Copy integrations/github-actions/pii-scan.yml into .github/workflows/ to fail PRs that introduce raw PII in CSV, JSON, TXT, or log files:

cp integrations/github-actions/pii-scan.yml .github/workflows/pii-scan.yml

pre-commit hook

Add to your .pre-commit-config.yaml:

repos:
  - repo: https://github.com/sunnypuli/pii-guard
    rev: main
    hooks:
      - id: pii-guard-scan
        args: [--preset, dpdp]

Then install hooks with pre-commit install. Commits that include files with detectable PII will be blocked.

Audit log

Every scan and tokenize run appends a line to ~/.pii-guard/audit.log:

2024-01-15T10:30:00  tokenize     customers.csv                   total=12  AADHAAR:3 EMAIL:4 PAN:2

Docker (proxy)

docker build -t pii-guard .
docker run -p 8111:8111 pii-guard --preset dpdp,pci

Then set ANTHROPIC_BASE_URL=http://localhost:8111 or OPENAI_BASE_URL=http://localhost:8111/openai/v1.

Contributing

Contributions welcome — especially:

New preset patterns (country-specific IDs, sector-specific formats)
False positive reports with reproducible examples
IDE and tool integrations

git clone https://github.com/sunnypuli/pii-guard
cd pii-guard
python -m venv venv && source venv/bin/activate
pip install -e ".[dev]"
pytest

Pattern PRs should include a test in tests/test_presets.py covering at least one valid and one invalid example.

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

May 4, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

piiwall-0.1.0.tar.gz (31.2 kB view details)

Uploaded May 4, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

piiwall-0.1.0-py3-none-any.whl (27.7 kB view details)

Uploaded May 4, 2026 Python 3

File details

Details for the file piiwall-0.1.0.tar.gz.

File metadata

Download URL: piiwall-0.1.0.tar.gz
Upload date: May 4, 2026
Size: 31.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for piiwall-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`6bbe9a10798db7f4c71dcec4865c01146fe33ffbaabc99cf9e2d427aeb84e49d`
MD5	`92fa21a88163f30b356bfc23e92bb9f9`
BLAKE2b-256	`9b2db5d6bb5ebec657da812d95e0fad3ba17cd0817cecd0a52ff4849be0769a1`

See more details on using hashes here.

File details

Details for the file piiwall-0.1.0-py3-none-any.whl.

File metadata

Download URL: piiwall-0.1.0-py3-none-any.whl
Upload date: May 4, 2026
Size: 27.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for piiwall-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`696ab94799b3f9bb75ef2a4bad5accdfddb0e7170cf55a24e6c5b90835389395`
MD5	`a2b5caf8cb499eb90ee27b91e9ca53c7`
BLAKE2b-256	`de45182b358f8b48b5369a4dde19546d2cd0b35ac52f3ecf5fc99aa45248e23b`

See more details on using hashes here.

piiwall 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

pii-guard

Works with every AI tool

How it works — three modes

Install

Mode 1 — CLI (tool-agnostic, manual)

Supported file formats

Session stats

Export session as CSV (for Excel / VLOOKUP)

Presets

Mode 2 — Claude Code hooks (automatic, zero-touch)

What the hooks do

Restore after Claude session

Control via environment variables

Mode 3 — API proxy (Cursor, Aider, Codex, Continue.dev, any SDK)

Set the base URL in your tool

What the proxy does

Proxy options

Restore after proxy session

Per-tool guides

Custom patterns

Persistent — ~/.pii-guard/config.yaml

Inline — --pattern / -P flag

Use from Python

How tokenization works

Limitations

CI/CD integration

GitHub Actions

pre-commit hook

Audit log

Docker (proxy)

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Persistent — `~/.pii-guard/config.yaml`

Inline — `--pattern` / `-P` flag