Skip to main content

Presidio-compatible PII detection, anonymization, and reversible tokenization

Project description

PII Vault

Presidio-compatible PII detection, anonymization, and reversible tokenization.

Crates.io npm License: MIT

Multi-language implementations sharing a common specification. Detect 40+ PII entity types, anonymize with multiple strategies (replace, mask, hash, redact), and reversibly tokenize with a persistent vault.

Install

# Rust
cargo add pii-vault

# TypeScript / JavaScript
npm install pii-vault

Features

  • 29 built-in recognizers covering 15 countries (US, UK, CN, IN, AU, DE, IT, ES, KR, SG, FI, SE, PL, JP, FR, CA, BR)
  • Presidio-aligned regex patterns for core entity types (email, credit card, IP, crypto)
  • Shared spec: Recognizer patterns defined as JSON, consumed by all language implementations
  • Vault: Deterministic, reversible tokenization with collision handling and context disambiguation
  • Multiple anonymization strategies: Replace, Mask, Hash, Redact, Vault
  • Luhn validation for credit cards, checksum validation for Chinese ID cards
  • Context-aware scoring: Boost detection confidence when context words appear nearby
  • Zero runtime dependencies beyond regex and JSON parsing

Quick Start

Rust

[dependencies]
pii-vault = "0.1"
use pii_vault::{Analyzer, Anonymizer, Operator, Vault, load_recognizers_from_dir};
use std::collections::HashMap;
use std::path::Path;

// Load recognizers from spec/
let recognizers = load_recognizers_from_dir(Path::new("spec/recognizers"));
let analyzer = Analyzer::new(recognizers);

// Analyze text
let text = "Email alice@company.com, SSN 123-45-6789";
let result = analyzer.analyze(text, &[], 0.0);

// Anonymize with vault (reversible)
let mut vault = Vault::new();
let mut ops = HashMap::new();
ops.insert("EMAIL_ADDRESS".to_string(), Operator::Vault);
ops.insert("US_SSN".to_string(), Operator::Vault);

let anon = Anonymizer::anonymize(text, &result.entities, &ops, &Operator::default(), Some(&mut vault));
println!("{}", anon.text);
// "Email [EMAIL_ADDRESS:a1b2], SSN [US_SSN:c3d4]"

// Restore original
let restored = vault.detokenize(&anon.text);
assert_eq!(restored, text);

TypeScript

npm install pii-vault
import { Analyzer, Anonymizer, RegexRecognizer, Vault } from 'pii-vault';
import * as fs from 'fs';

// Load recognizers from spec/
const specDir = './spec/recognizers';
const recognizers = fs.readdirSync(specDir)
  .filter(f => f.endsWith('.json'))
  .map(f => new RegexRecognizer(JSON.parse(fs.readFileSync(`${specDir}/${f}`, 'utf-8'))));

const analyzer = new Analyzer(recognizers);

// Analyze
const text = 'Email alice@company.com, SSN 123-45-6789';
const result = analyzer.analyze(text);

// Anonymize with vault
const vault = new Vault();
const ops = { EMAIL_ADDRESS: { type: 'vault' }, US_SSN: { type: 'vault' } };
const anon = Anonymizer.anonymize(text, result.entities, ops, { type: 'replace' }, vault);

// Restore
const restored = vault.detokenize(anon.text);

Architecture

pii-vault/
├── spec/                     # Shared specification (language-agnostic)
│   ├── entities.json         # 45 entity type definitions
│   ├── recognizers/          # 29 regex recognizer definitions (JSON)
│   └── test-cases/           # Cross-language test cases
├── rust/                     # Rust implementation → crates.io: pii-vault
├── typescript/               # TypeScript implementation → npm: pii-vault
├── go/                       # Go implementation (planned)
├── java/                     # Java implementation (planned)
├── haskell/                  # Haskell implementation (planned)
└── wasm/                     # WASM from Rust (planned)

The spec/recognizers/*.json files are the single source of truth. All language implementations load these patterns at runtime or compile time.

Supported Entity Types

Generic (all languages)

EMAIL_ADDRESS, PHONE_NUMBER, CREDIT_CARD, CRYPTO, IP_ADDRESS, MAC_ADDRESS, IBAN_CODE, URL, UUID

Country-Specific

Country Entities
US SSN, ITIN, Passport, Driver License, Bank Routing
UK NHS, NINO
China ID Card (18-digit), Phone, Passport, Bank Card
India Aadhaar, PAN, Passport
Australia TFN, Medicare, ABN
Germany Steuer-ID
Italy Fiscal Code
Spain NIE, NIF
Korea RRN
Singapore NRIC
Finland Personal ID
Sweden Personal Number
Poland PESEL
Japan My Number, Passport
France NIR
Canada SIN
Brazil CPF

Anonymization Strategies

Strategy Description Reversible
Replace Replace with <ENTITY_TYPE> or custom string No
Mask Partially mask characters (e.g., ****1111) No
Hash FNV hash of original value No
Redact Remove entirely No
Vault Deterministic token [ENTITY:xxxx] with persistent mapping Yes

Contributing

Add a new recognizer:

  1. Create spec/recognizers/your_entity.json following the existing format
  2. Add test cases to spec/test-cases/
  3. Run tests in both Rust and TypeScript to verify

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pii_vault-0.1.0.tar.gz (4.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pii_vault-0.1.0-py3-none-any.whl (4.0 kB view details)

Uploaded Python 3

File details

Details for the file pii_vault-0.1.0.tar.gz.

File metadata

  • Download URL: pii_vault-0.1.0.tar.gz
  • Upload date:
  • Size: 4.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for pii_vault-0.1.0.tar.gz
Algorithm Hash digest
SHA256 3e12a7c0d1efaa5c1a25b2d664183155c059f040bfdbcc1f5c80e398a6f46cbe
MD5 1adc5efeeb01dede7717ea32278db707
BLAKE2b-256 54b46fa35380155402fbb1a2f44b24206b86c630e185a88940636309cf9c3b7e

See more details on using hashes here.

File details

Details for the file pii_vault-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: pii_vault-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 4.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for pii_vault-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 cf89e4d0e405f5216627c38055666293697e1f3c1738232169e0ea9b84570880
MD5 de92f540af9294c7fd0bac4bf775c2b2
BLAKE2b-256 092d60e83129c8e12ae7a9d7d0f2261880d287c3f3fbbd26c6b1f13a0565d9fb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page