Skip to main content

Presidio-compatible PII detection, anonymization, and reversible tokenization. Python SDK planned — use Rust or TypeScript SDK today.

Project description

PII Vault

Presidio-compatible PII detection, anonymization, and reversible tokenization.

Crates.io npm PyPI License: MIT

Multi-language implementations sharing a common specification. Detect 40+ PII entity types, anonymize with multiple strategies (replace, mask, hash, redact), and reversibly tokenize with a persistent vault.

Install

# Rust
cargo add pii-vault

# TypeScript / JavaScript
npm install pii-vault

# Python
pip install pii-vault

# Go
go get github.com/Jiansen/pii-vault/go

Features

  • 29 built-in recognizers covering 15 countries (US, UK, CN, IN, AU, DE, IT, ES, KR, SG, FI, SE, PL, JP, FR, CA, BR)
  • Presidio-aligned regex patterns for core entity types (email, credit card, IP, crypto)
  • Shared spec: Recognizer patterns defined as JSON, consumed by all language implementations
  • Vault: Deterministic, reversible tokenization with collision handling and context disambiguation
  • Multiple anonymization strategies: Replace, Mask, Hash, Redact, Vault
  • Luhn validation for credit cards, checksum validation for Chinese ID cards
  • Context-aware scoring: Boost detection confidence when context words appear nearby
  • Zero runtime dependencies beyond regex and JSON parsing

Quick Start

Rust

[dependencies]
pii-vault = "0.1"
use pii_vault::{Analyzer, Anonymizer, Operator, Vault, load_recognizers_from_dir};
use std::collections::HashMap;
use std::path::Path;

// Load recognizers from spec/
let recognizers = load_recognizers_from_dir(Path::new("spec/recognizers"));
let analyzer = Analyzer::new(recognizers);

// Analyze text
let text = "Email alice@company.com, SSN 123-45-6789";
let result = analyzer.analyze(text, &[], 0.0);

// Anonymize with vault (reversible)
let mut vault = Vault::new();
let mut ops = HashMap::new();
ops.insert("EMAIL_ADDRESS".to_string(), Operator::Vault);
ops.insert("US_SSN".to_string(), Operator::Vault);

let anon = Anonymizer::anonymize(text, &result.entities, &ops, &Operator::default(), Some(&mut vault));
println!("{}", anon.text);
// "Email [EMAIL_ADDRESS:a1b2], SSN [US_SSN:c3d4]"

// Restore original
let restored = vault.detokenize(&anon.text);
assert_eq!(restored, text);

TypeScript

npm install pii-vault
import { Analyzer, Anonymizer, RegexRecognizer, Vault } from 'pii-vault';
import * as fs from 'fs';

// Load recognizers from spec/
const specDir = './spec/recognizers';
const recognizers = fs.readdirSync(specDir)
  .filter(f => f.endsWith('.json'))
  .map(f => new RegexRecognizer(JSON.parse(fs.readFileSync(`${specDir}/${f}`, 'utf-8'))));

const analyzer = new Analyzer(recognizers);

// Analyze
const text = 'Email alice@company.com, SSN 123-45-6789';
const result = analyzer.analyze(text);

// Anonymize with vault
const vault = new Vault();
const ops = { EMAIL_ADDRESS: { type: 'vault' }, US_SSN: { type: 'vault' } };
const anon = Anonymizer.anonymize(text, result.entities, ops, { type: 'replace' }, vault);

// Restore
const restored = vault.detokenize(anon.text);

Architecture

pii-vault/
├── spec/                     # Shared specification (language-agnostic)
│   ├── entities.json         # 45 entity type definitions
│   ├── recognizers/          # 29 regex recognizer definitions (JSON)
│   └── test-cases/           # Cross-language test cases
├── rust/                     # Rust implementation → crates.io: pii-vault
├── typescript/               # TypeScript implementation → npm: pii-vault
├── go/                       # Go implementation (planned)
├── java/                     # Java implementation (planned)
├── haskell/                  # Haskell implementation (planned)
└── wasm/                     # WASM from Rust (planned)

The spec/recognizers/*.json files are the single source of truth. All language implementations load these patterns at runtime or compile time.

Supported Entity Types

Generic (all languages)

EMAIL_ADDRESS, PHONE_NUMBER, CREDIT_CARD, CRYPTO, IP_ADDRESS, MAC_ADDRESS, IBAN_CODE, URL, UUID

Country-Specific

Country Entities
US SSN, ITIN, Passport, Driver License, Bank Routing
UK NHS, NINO
China ID Card (18-digit), Phone, Passport, Bank Card
India Aadhaar, PAN, Passport
Australia TFN, Medicare, ABN
Germany Steuer-ID
Italy Fiscal Code
Spain NIE, NIF
Korea RRN
Singapore NRIC
Finland Personal ID
Sweden Personal Number
Poland PESEL
Japan My Number, Passport
France NIR
Canada SIN
Brazil CPF

Anonymization Strategies

Strategy Description Reversible
Replace Replace with <ENTITY_TYPE> or custom string No
Mask Partially mask characters (e.g., ****1111) No
Hash FNV hash of original value No
Redact Remove entirely No
Vault Deterministic token [ENTITY:xxxx] with persistent mapping Yes

Contributing

Add a new recognizer:

  1. Create spec/recognizers/your_entity.json following the existing format
  2. Add test cases to spec/test-cases/
  3. Run tests in both Rust and TypeScript to verify

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pii_vault-0.2.1.tar.gz (4.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pii_vault-0.2.1-py3-none-any.whl (4.4 kB view details)

Uploaded Python 3

File details

Details for the file pii_vault-0.2.1.tar.gz.

File metadata

  • Download URL: pii_vault-0.2.1.tar.gz
  • Upload date:
  • Size: 4.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for pii_vault-0.2.1.tar.gz
Algorithm Hash digest
SHA256 efc32978772d4e58b71f9420e6ee518eed764f2c44cd52c87943a198506df95d
MD5 18dd66cb08a67d07f914f37e93618ffe
BLAKE2b-256 ca9ff2edead926e2744bd7eaeb9180e1db27cf211b288d2ea249af75e2962837

See more details on using hashes here.

File details

Details for the file pii_vault-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: pii_vault-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 4.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for pii_vault-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 34940d4d705f46ee0f4be76603c4f54c96a250093fd6bc094e6533e5cc2fe8f3
MD5 f8417b85670884c75f49d805d1138d57
BLAKE2b-256 6f72b56405f1697d8c0283b7e32adc073b85baf3d4d4fca40676feb356cb09d9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page