Presidio-compatible PII detection, anonymization, and reversible tokenization
Project description
PII Vault
Presidio-compatible PII detection, anonymization, and reversible tokenization.
Multi-language implementations sharing a common specification. Detect 40+ PII entity types, anonymize with multiple strategies (replace, mask, hash, redact), and reversibly tokenize with a persistent vault.
Install
# Rust
cargo add pii-vault
# TypeScript / JavaScript
npm install pii-vault
Features
- 29 built-in recognizers covering 15 countries (US, UK, CN, IN, AU, DE, IT, ES, KR, SG, FI, SE, PL, JP, FR, CA, BR)
- Presidio-aligned regex patterns for core entity types (email, credit card, IP, crypto)
- Shared spec: Recognizer patterns defined as JSON, consumed by all language implementations
- Vault: Deterministic, reversible tokenization with collision handling and context disambiguation
- Multiple anonymization strategies: Replace, Mask, Hash, Redact, Vault
- Luhn validation for credit cards, checksum validation for Chinese ID cards
- Context-aware scoring: Boost detection confidence when context words appear nearby
- Zero runtime dependencies beyond regex and JSON parsing
Quick Start
Rust
[dependencies]
pii-vault = "0.1"
use pii_vault::{Analyzer, Anonymizer, Operator, Vault, load_recognizers_from_dir};
use std::collections::HashMap;
use std::path::Path;
// Load recognizers from spec/
let recognizers = load_recognizers_from_dir(Path::new("spec/recognizers"));
let analyzer = Analyzer::new(recognizers);
// Analyze text
let text = "Email alice@company.com, SSN 123-45-6789";
let result = analyzer.analyze(text, &[], 0.0);
// Anonymize with vault (reversible)
let mut vault = Vault::new();
let mut ops = HashMap::new();
ops.insert("EMAIL_ADDRESS".to_string(), Operator::Vault);
ops.insert("US_SSN".to_string(), Operator::Vault);
let anon = Anonymizer::anonymize(text, &result.entities, &ops, &Operator::default(), Some(&mut vault));
println!("{}", anon.text);
// "Email [EMAIL_ADDRESS:a1b2], SSN [US_SSN:c3d4]"
// Restore original
let restored = vault.detokenize(&anon.text);
assert_eq!(restored, text);
TypeScript
npm install pii-vault
import { Analyzer, Anonymizer, RegexRecognizer, Vault } from 'pii-vault';
import * as fs from 'fs';
// Load recognizers from spec/
const specDir = './spec/recognizers';
const recognizers = fs.readdirSync(specDir)
.filter(f => f.endsWith('.json'))
.map(f => new RegexRecognizer(JSON.parse(fs.readFileSync(`${specDir}/${f}`, 'utf-8'))));
const analyzer = new Analyzer(recognizers);
// Analyze
const text = 'Email alice@company.com, SSN 123-45-6789';
const result = analyzer.analyze(text);
// Anonymize with vault
const vault = new Vault();
const ops = { EMAIL_ADDRESS: { type: 'vault' }, US_SSN: { type: 'vault' } };
const anon = Anonymizer.anonymize(text, result.entities, ops, { type: 'replace' }, vault);
// Restore
const restored = vault.detokenize(anon.text);
Architecture
pii-vault/
├── spec/ # Shared specification (language-agnostic)
│ ├── entities.json # 45 entity type definitions
│ ├── recognizers/ # 29 regex recognizer definitions (JSON)
│ └── test-cases/ # Cross-language test cases
├── rust/ # Rust implementation → crates.io: pii-vault
├── typescript/ # TypeScript implementation → npm: pii-vault
├── go/ # Go implementation (planned)
├── java/ # Java implementation (planned)
├── haskell/ # Haskell implementation (planned)
└── wasm/ # WASM from Rust (planned)
The spec/recognizers/*.json files are the single source of truth. All language implementations load these patterns at runtime or compile time.
Supported Entity Types
Generic (all languages)
EMAIL_ADDRESS, PHONE_NUMBER, CREDIT_CARD, CRYPTO, IP_ADDRESS, MAC_ADDRESS, IBAN_CODE, URL, UUID
Country-Specific
| Country | Entities |
|---|---|
| US | SSN, ITIN, Passport, Driver License, Bank Routing |
| UK | NHS, NINO |
| China | ID Card (18-digit), Phone, Passport, Bank Card |
| India | Aadhaar, PAN, Passport |
| Australia | TFN, Medicare, ABN |
| Germany | Steuer-ID |
| Italy | Fiscal Code |
| Spain | NIE, NIF |
| Korea | RRN |
| Singapore | NRIC |
| Finland | Personal ID |
| Sweden | Personal Number |
| Poland | PESEL |
| Japan | My Number, Passport |
| France | NIR |
| Canada | SIN |
| Brazil | CPF |
Anonymization Strategies
| Strategy | Description | Reversible |
|---|---|---|
| Replace | Replace with <ENTITY_TYPE> or custom string |
No |
| Mask | Partially mask characters (e.g., ****1111) |
No |
| Hash | FNV hash of original value | No |
| Redact | Remove entirely | No |
| Vault | Deterministic token [ENTITY:xxxx] with persistent mapping |
Yes |
Contributing
Add a new recognizer:
- Create
spec/recognizers/your_entity.jsonfollowing the existing format - Add test cases to
spec/test-cases/ - Run tests in both Rust and TypeScript to verify
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pii_vault-0.1.0.tar.gz.
File metadata
- Download URL: pii_vault-0.1.0.tar.gz
- Upload date:
- Size: 4.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3e12a7c0d1efaa5c1a25b2d664183155c059f040bfdbcc1f5c80e398a6f46cbe
|
|
| MD5 |
1adc5efeeb01dede7717ea32278db707
|
|
| BLAKE2b-256 |
54b46fa35380155402fbb1a2f44b24206b86c630e185a88940636309cf9c3b7e
|
File details
Details for the file pii_vault-0.1.0-py3-none-any.whl.
File metadata
- Download URL: pii_vault-0.1.0-py3-none-any.whl
- Upload date:
- Size: 4.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cf89e4d0e405f5216627c38055666293697e1f3c1738232169e0ea9b84570880
|
|
| MD5 |
de92f540af9294c7fd0bac4bf775c2b2
|
|
| BLAKE2b-256 |
092d60e83129c8e12ae7a9d7d0f2261880d287c3f3fbbd26c6b1f13a0565d9fb
|