Arabic/GCC PII detection, tokenization, and streaming-interception gateway.

These details have not been verified by PyPI

Project links

Project description

arabic-pii-py · `apii`

pip install apii

Keep Arabic & GCC personal data off the LLM — without changing how you work. Names, IBANs, national IDs, phones, emails, addresses get swapped for reversible tokens on your machine, before anything reaches Claude / GPT. You keep seeing the real values. The model never does.

Everything runs locally. No cloud, no account, no data leaves your laptop (the only network call is a one-time, optional model download you can replace).

Why this exists

If you work with GCC customer data — banks, telcos, government, clinics — you often legally cannot send that PII to a US-hosted LLM. apii lets you use Claude / GPT on that data anyway: the personal data stays on your machine, the model only ever sees placeholders like EMAIL_C7E2…, and the real values are restored locally for you.

🇸🇦 Built for Arabic & the GCC — Saudi / Emirati / Qatari / Kuwaiti / Bahraini / Omani shapes, IBAN ISO-7064 (MOD-97), national-ID checksums, and on-device Arabic + English NER for names & organizations.
💻 100% local — no service to run, no API key required, nothing uploaded.
🪶 Lightweight — pure Python, no PyTorch; NER runs as int8 ONNX.
🔁 Reversible & stable — the same value always maps to the same token, and only your secret can turn it back.

⭐ The headline: transparent PII protection inside Claude Code

This is the part most tools can't do. With one command, apii wires two hooks into Claude Code:

	what happens	result
redact-on-read	when Claude reads a file, PII in it is tokenized before the model sees it	Claude only ever sees `EMAIL_…`, `IBAN_…`, `PERSON_…`
restore-on-write	when Claude writes/edits a file, the tokens are turned back into real values before the bytes hit disk	your files come out correct, the chat stays tokens
`apii watch`	a side pane restores Claude's tokenized replies locally	you read the real values; Anthropic still only got tokens

It's non-blocking — you work normally; Claude just never receives the PII.

# in the project where you keep customer data:
apii install-claude-hook          # one time — wires both hooks

# open a fresh Claude Code session there (it loads the hook at startup)
claude

# and, in another terminal pane IN THE SAME PROJECT FOLDER, watch real values:
apii watch          # follows this folder's session; --once dumps it so far

Now ask Claude to work on a file with PII. In the chat you'll see tokens; in the apii watch pane you'll see the real data; and your customer's information never left your machine.

🤖 Or just hand it to your coding agent

Don't want to wire anything up? Give your agent the skill and it learns to use apii on its own — redacting PII before it reads a file or calls a model. skills/apii/SKILL.md is the portable Agent Skills format, so the same file works in Claude Code, Codex, Cursor, and any tool that reads it.

# Claude Code — every project:
cp -r skills/apii ~/.claude/skills/     # then just talk to it (auto-loads), or run /apii
# or only this project:
cp -r skills/apii .claude/skills/
# any other agent: paste skills/apii/SKILL.md into its context or skills folder.

Which path do I pick? They stack — use as many as you like:

	what it is	best when
Skill `skills/apii/`	teaches the agent to deliberately redact / restore	the agent drives; any tool; zero setup
Hook `apii install-claude-hook`	automatic redact-on-read + restore-on-write	you want it invisible & enforced — Claude Code only
Proxy `apii serve`	a hard transport boundary — the provider literally can't see PII	you don't control the client, or want it provider-wide

My take: skill + hook is the sweet spot for daily Claude Code work — the hook guarantees protection even when the agent forgets, the skill makes the agent smart about using it. Reach for the proxy when the guarantee has to hold at the wire, for clients you don't control.

Install

pip install "apii[all]"     # the whole tool: CLI + NER + proxy + documents

Requires Python 3.10+. apii[all] is what you want to use apii — the CLI, the Claude Code hook, the proxy. Embedding it as a library instead? Stay lean and add only what you touch:

install	what you get
`pip install apii`	core detection (regex + checksums) + the `apii` CLI
`pip install "apii[ner]"`	+ on-device PERSON / ORGANIZATION (names & orgs)
`pip install "apii[cli]"`	+ encrypted-vault persistence (`--vault`)
`pip install "apii[proxy]"`	+ the streaming `apii serve` gateway
`pip install "apii[documents]"`	+ PDF text (docx / xlsx / csv / json are built in)
`pip install "apii[all]"`	everything above

NER models (names & organizations) auto-download once (~210 MB, int8 ONNX) from Hugging Face and cache under ~/.cache/huggingface. Without them, every structured kind (email, phone, IBAN, ID, CR, VAT, address) still works — only PERSON / ORGANIZATION need the models. Point at your own copy any time with APII_NER_MODEL / APII_NER_EN_MODEL, or change the source repo with APII_NER_HF_REPO.

To hack on it instead, clone the repo and pip install -e ".[all]" — see Make it your own below.

Ways to use it — pick what fits

1. Claude Code (above) — the transparent, zero-friction path.

2. CLI — text, files, and folders

# free text or a .txt file → tokens (mapping saved to a vault), then restore:
echo "call 0501234567, email omar@aajil.sa" | apii redact --vault demo.vault
apii restore answer.txt --vault demo.vault   # the model's tokens → real values

apii detect notes.txt                        # audit only: detections as JSON

# whole folders, format-aware (csv / json / docx / xlsx / pdf→txt):
apii scan-dir ./statements --ext csv --out audit.jsonl
apii redact-dir ./statements --out-dir ./masked --ext csv --vault s.vault

apii redact <file> reads the file as text. For documents (pdf/docx/xlsx/json) use redact-dir or the UI — they preserve layout.

3. Local UI — paste-in / paste-out (+ file upload)

apii ui    # opens http://127.0.0.1:8765 — paste text or drop a CSV/Excel,
           # take the tokens to any LLM, paste the reply back to restore.

4. As a library — embed it in your own app

from apii.anonymizer import Anonymizer

a = Anonymizer(secret="your-secret", tenant="acme")
r = a.anonymize("Email omar@aajil.sa, IBAN SA0380000000608010167519")
send_to_llm(r.text)                      # the model sees only tokens
show_user(a.deanonymize(model_reply))    # real values restored locally

5. Drop-in proxy — one local endpoint, every provider 🔌

Run one gateway and point any LLM client at it. apii tokenizes each request, sends only tokens upstream, and restores the (streamed) reply — the client never changes, the provider never sees PII, and your own API key just passes through (apii never stores it).

pip install "apii[proxy]"
apii serve                 # → http://127.0.0.1:8720   (--host / --port to change)

One port speaks three wire formats, so the same gateway fronts OpenAI, Anthropic, Codex, and anything OpenAI-compatible — OpenRouter, LiteLLM, Together, vLLM, …:

your client	point it at	upstream env
OpenAI SDK / chat apps	`base_url = http://127.0.0.1:8720/v1`	`APII_OPENAI_BASE` (default `api.openai.com`)
Codex CLI (Responses API)	a custom model-provider → `:8720/v1`, `wire_api = "responses"`	`APII_OPENAI_BASE`
Claude Code / Anthropic SDK	`ANTHROPIC_BASE_URL = http://127.0.0.1:8720`	`APII_ANTHROPIC_BASE` (default `api.anthropic.com`)
OpenRouter · LiteLLM · any OpenAI-compatible	the OpenAI `base_url` above	set `APII_OPENAI_BASE` to that provider

# e.g. route OpenAI-style traffic through OpenRouter, PII-safe:
APII_OPENAI_BASE=https://openrouter.ai/api/v1 apii serve
# your app: base_url = http://127.0.0.1:8720/v1  + your OpenRouter key, as usual

# e.g. point the real Codex CLI at apii — ~/.codex/config.toml
model = "gpt-4o-mini"           # any model your upstream serves
model_provider = "apii"
[model_providers.apii]
base_url = "http://127.0.0.1:8720/v1"
wire_api = "responses"
env_key  = "OPENAI_API_KEY"     # OPENROUTER_API_KEY if APII_OPENAI_BASE → OpenRouter

Every route is verified end-to-end — streaming and non-streaming — against a live provider, including the real Codex CLI: the provider gets only tokens, your client gets the real values back.

What it detects

kind	how
`EMAIL`	format
`PHONE`	GCC country codes, Saudi 05X, intl shapes
`IBAN`	ISO-7064 MOD-97 checksum (all 6 GCC countries)
`TAX_NUMBER`	15-digit Saudi / GCC VAT
`COMMERCIAL_REGISTRATION`	10-digit CR, label-cued
`NATIONAL_ID`	UAE-784 / Saudi-Iqama / GCC
`PERSON`	on-device NER (no name lists, no regex)
`ORGANIZATION`	on-device NER
`ADDRESS`	PO-box / street regex + NER locations

Quality is measured against a 1,340-span corpus of real, publicly-sourced values (tests/eval/) — pytest tests/python -q runs it.

Command reference

command	what it does
`apii redact [file]`	Anonymize text (stdin or a text file) → stdout; save the token↔value map to `--vault`.
`apii restore [file] --vault V`	Reverse it: tokens → real values, using the vault.
`apii detect [file]`	Audit mode — list detections as JSON, redact nothing.
`apii scan-dir DIR --out F`	Detect across a folder; write per-file JSONL summaries + totals.
`apii redact-dir DIR --out-dir D`	Redact every matching file (format-aware) into `--out-dir`, merging records into one `--vault`.
`apii ui`	Local paste-in / paste-out web UI + file upload (`127.0.0.1:8765`).
`apii serve`	Local anonymizing LLM proxy — `/v1/messages`, `/v1/chat/completions`, `/v1/responses` (needs `[proxy]`).
`apii watch`	Side-viewer: tail the current folder's Claude session, restoring tokens for your screen. `--once` dumps the session so far.
`apii install-claude-hook`	Wire redact-on-read + restore-on-write into Claude Code in one command (`--global` for all projects).
`apii hook`	The per-event hook itself (stdin event JSON → response JSON); used by the installed hooks.
`apii daemon`	Long-lived local hook daemon (`POST /hook`) — avoids a process spawn per event.
`apii hook-client`	Thin bridge that relays a hook event to a running `daemon`.

Common flags: --secret (or $APII_SECRET), --tenant, --vault, --policy strict|balanced|audit, --no-ner. Run apii <cmd> --help for the rest.

Environment variables

var	purpose
`APII_SECRET`	Vault HMAC / encryption key. Falls back to the managed `~/.apii/secret` (auto-created, `chmod 600`).
`APII_HOME`	Config + vault directory (default `~/.apii`).
`APII_POLICY`	Default policy: `strict` (default) / `balanced` / `audit`.
`APII_NER_CASE_AUG`	Lowercase-name recovery: `auto` (default — fires on fully-lowercase input) / `always` (mixed-case too) / `off`.
`APII_NER_THRESHOLD`	NER minimum confidence (default `0.85`).
`APII_NER_MODEL` / `APII_NER_EN_MODEL`	Use your own local Arabic / English ONNX model dirs (override the auto-download).
`APII_NER_HF_REPO`	Hugging Face repo to fetch models from (default `aajil-labs-sa/arabic-pii-ner`).
`APII_NER_NO_DOWNLOAD`	Set to disable the model auto-download (fully offline).
`APII_ANTHROPIC_BASE` / `APII_OPENAI_BASE`	Upstream targets for `apii serve`.
`APII_SUPPRESS_PHRASES`	Path to a phrase file of structural vocabulary to never tokenize.
`APII_GEO_GAZETTEER`	Path to an optional geo gazetteer for address detection.

How it stays private (the model)

Two separate boundaries — that's the whole trick:

Privacy boundary = what the LLM receives → only tokens, always.
Display boundary = what you see → real values, because it's your data on your machine.

The bridge is a local, encrypted vault (~/.apii/default.vault, ChaCha20) plus a secret (~/.apii/secret, chmod 600). Tokens are HMAC-SHA256(secret, value) — deterministic, and irreversible without your secret. Restoration is applied at the last mile (your screen, your files) and never re-enters the model's context.

Make it your own

This is a normal, self-contained Python package — it's yours to run, change, and ship privately. You never have to publish it anywhere or run a server.

Customize detection: the recognizers live in apii/recognizers/ — edit a regex, tune a checksum, add a country shape.
Swap the NER models: point APII_NER_MODEL / APII_NER_EN_MODEL at your own ONNX models, or set APII_NER_HF_REPO to your own Hugging Face repo.
Change token formats, policy, vault location (APII_HOME), tenants, etc.
Stay fully offline: clone, pip install -e ., bring the NER models locally — no cloud, no PyPI, no service, ever.

It's built to be forked and made internal. Keep it private; it's yours.

NER models & credit

The bundled models are int8-ONNX redistributions of two open models — please keep crediting the original authors:

Arabic — hatmimoha/arabic-ner by Hatim Mohamed (on asafaya/bert-base-arabic by Ali Safaya).
English — dslim/bert-base-NER by David S. Lim (MIT, CoNLL-2003).

Hosted (quantized) at aajil-labs-sa/arabic-pii-ner with full provenance + SHAs.

License

© Aajil Labs. Dual-licensed — your choice of MIT or Apache-2.0 (see LICENSE-MIT and LICENSE-APACHE).

You may use, modify, and redistribute this software (including privately and commercially) under either license. You must keep the copyright and license notices in copies and substantial portions. The bundled NER models are redistributed under their original authors' terms — credit them as above.

This is yours to build on — just respect the license.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.3

May 31, 2026

0.1.2

May 31, 2026

This version

0.1.1

May 30, 2026

0.1.0

May 30, 2026

0.1.0rc1 pre-release

May 30, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

apii-0.1.1.tar.gz (267.2 kB view details)

Uploaded May 30, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

apii-0.1.1-py3-none-any.whl (122.7 kB view details)

Uploaded May 30, 2026 Python 3

File details

Details for the file apii-0.1.1.tar.gz.

File metadata

Download URL: apii-0.1.1.tar.gz
Upload date: May 30, 2026
Size: 267.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for apii-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`9754a9adc707f8e77e5335f16916dd36fbcecc1847b572139843994a4c467ecb`
MD5	`945689518ba6791541d31a9900aaa8a7`
BLAKE2b-256	`127202cf9ef515a46260227d33f53d2477f22e5763506b32f5dc66981c8fa030`

See more details on using hashes here.

File details

Details for the file apii-0.1.1-py3-none-any.whl.

File metadata

Download URL: apii-0.1.1-py3-none-any.whl
Upload date: May 30, 2026
Size: 122.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for apii-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`74b194d27ce809e37d1bba99fb04dd3f2889205669c3a43608a30b1b4c8a8a20`
MD5	`685fd618de243581c34c67c4f2d69a1f`
BLAKE2b-256	`5c66b5a5db0e9a338f6063467b62db43a46a5740205d50685a19508994bfd4a1`

See more details on using hashes here.

apii 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

arabic-pii-py · apii

Why this exists

⭐ The headline: transparent PII protection inside Claude Code

🤖 Or just hand it to your coding agent

Install

Ways to use it — pick what fits

1. Claude Code (above) — the transparent, zero-friction path.

2. CLI — text, files, and folders

3. Local UI — paste-in / paste-out (+ file upload)

4. As a library — embed it in your own app

5. Drop-in proxy — one local endpoint, every provider 🔌

What it detects

Command reference

Environment variables

How it stays private (the model)

Make it your own

NER models & credit

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

arabic-pii-py · `apii`