Automatic PII masking for OpenAI and Anthropic SDKs
Project description
Armos
PII never reaches your LLM. One line of code.
Armos wraps the OpenAI and Anthropic SDKs to automatically detect and mask personally identifiable information (PII) before it leaves your server — and restore the real values in the response. Your application code changes by exactly one word.
The problem
Every time your application calls an LLM, it sends raw text to a third-party server. If a user's message contains their name, Aadhaar number, email, PAN card, or credit card — that data leaves your infrastructure.
This matters for:
- Healthcare apps — patient names, dates of birth, medical IDs
- Fintech apps — PAN, Aadhaar, bank details
- Customer support tools — names, emails, phone numbers, addresses
- Any app where users type free text that gets sent to OpenAI or Anthropic
Most teams know this is a risk. Few have time to build a proper masking layer before shipping. Armos is that layer, pre-built.
How it works
Detection runs entirely on your machine. Presidio + spaCy analyse the text locally. No data is sent to any Armos server — there is no Armos server. The vault (token ↔ real value map) lives in your process memory, or optionally in your own Redis instance.
Quickstart
Install
pip install armos
For Redis-backed persistence across requests:
pip install armos[redis]
Note: On first use, download the spaCy language model:
python -m spacy download en_core_web_lg
OpenAI
# Before
from openai import OpenAI
client = OpenAI()
# After — one import added, one word changed
from openai import OpenAI
from armos import ArmosOpenAI
client = ArmosOpenAI(OpenAI())
# Everything else is identical
response = client.chat.completions.create(
model="gpt-4o",
messages=[{
"role": "user",
"content": "Summarise the case for patient John Smith, Aadhaar 2345 6789 0123"
}]
)
# Real values are restored in the response automatically
print(response.choices[0].message.content)
Anthropic
from anthropic import Anthropic
from armos import ArmosAnthropic
client = ArmosAnthropic(Anthropic())
message = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[{
"role": "user",
"content": "Patient John Smith, DOB 12/04/1982, PAN ABCDE1234F"
}]
)
print(message.content[0].text) # real values restored
With Redis (persistent vault across requests)
# Token mappings survive across processes and requests
client = ArmosOpenAI(OpenAI(), store="redis://localhost:6379")
client = ArmosAnthropic(Anthropic(), store="redis://localhost:6379")
# Custom TTL (default: 24 hours)
client = ArmosOpenAI(OpenAI(), store="redis://localhost:6379", vault_ttl=3600)
Standalone (any LLM or framework)
from armos import Armos
guard = Armos()
result = guard.mask("Patient John Smith, Aadhaar 2345 6789 0123, email john@hospital.com")
print(result.text)
# → "Patient [PII:NAME:a1b2c3d4], Aadhaar [PII:AADHAAR:b2c3d4e5], email [PII:EMAIL:e5f6g7h8]"
print(result.has_pii) # True
restored = guard.demask(result.text)
print(restored)
# → "Patient John Smith, Aadhaar 2345 6789 0123, email john@hospital.com"
What gets detected
| Entity | Token | Example |
|---|---|---|
| Person name | [PII:NAME:…] |
John Smith |
| Email address | [PII:EMAIL:…] |
john@hospital.com |
| Phone number | [PII:PHONE:…] |
+91 98765 43210 |
| Aadhaar number | [PII:AADHAAR:…] |
2345 6789 0123 |
| PAN card | [PII:PAN:…] |
ABCDE1234F |
| Credit / debit card | [PII:CARD:…] |
4111 1111 1111 1111 |
| IP address | [PII:IP:…] |
192.168.1.100 |
| API keys & secrets | [PII:APIKEY:…] |
sk-abc123… / AKIA… / ghp_… |
Token design
Tokens are deterministic and normalisation-aware:
"john smith" → [PII:NAME:a1b2c3d4] ← stored: "john smith"
"John Smith" → [PII:NAME:a1b2c3d4] ← same token, vault unchanged
"JOHN SMITH" → [PII:NAME:a1b2c3d4] ← same token, vault unchanged
All casing variants of the same name map to one token. The LLM sees one consistent entity across a conversation — not three different people. De-masking restores the first-seen value.
Vault options
| Option | Default | Use when |
|---|---|---|
| In-memory | Armos() |
Single request or single process |
| Redis | Armos(store="redis://…") |
Multi-turn conversations, multiple workers, or across requests |
In-memory vault is zero configuration and the default. Redis vault persists token mappings so a token created in request 1 can be de-masked in request 5.
v1 limitations
- Streaming not supported —
stream=Truepasses through without masking. (v1.1) - Async clients not supported —
AsyncOpenAI,AsyncAnthropicpass through without masking. (v1.1) - OpenAI Responses API not intercepted —
client.responses.create()passes through. (v1.1) - Embeddings not masked —
client.embeddings.create()sends text as-is. (v1.1) - Indian name accuracy —
en_core_web_lgis trained on English text; Indian names have lower recall than Western names. Fine-tuning planned for v2. - Casing: first-seen wins — De-masking always restores the first-seen casing of an entity. Use consistent casing in your prompts for exact restoration.
- Token length —
[PII:NAME:a1b2c3d4]is 18 chars vsJohn(4 chars). Near context-window limits this may push content over. Rare in practice.
Contributing
Armos is open source and MIT licensed. Issues and pull requests welcome.
git clone https://github.com/armos-ai/armos
cd armos
pip install -e ".[dev,all]"
python -m spacy download en_core_web_lg
pytest tests/ -v
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file armos-0.1.0.tar.gz.
File metadata
- Download URL: armos-0.1.0.tar.gz
- Upload date:
- Size: 491.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a2add55c7aaa84c5dba472b875e6bd168905dd787973ff2e8ad36fedac114a6a
|
|
| MD5 |
26fa94a5523065a74ad0de10215186cc
|
|
| BLAKE2b-256 |
134bbf4d1e77c79097a463051fad01ef279daff9e99065d39df7e339d66a59b4
|
File details
Details for the file armos-0.1.0-py3-none-any.whl.
File metadata
- Download URL: armos-0.1.0-py3-none-any.whl
- Upload date:
- Size: 16.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
428959033fcea5dfd4908888ca2e08a017b04b3448ec05e083ad56ce4c06e78f
|
|
| MD5 |
ed112bf956861876f36148ed9f188037
|
|
| BLAKE2b-256 |
1e84ac08178613cdb33bab28aa88818189a5aec1f62df0d7c3e81fdb8be82d7c
|