Production-grade PII detection, masking, and reversible anonymization library backed by fine-tuned DeBERTa models.
Project description
anonypii
Production-grade PII detection, masking, and reversible anonymization for Python.
Backed by two fine-tuned DeBERTa-v3-base models from the PIIBench research:
| Model | HuggingFace | Full-test F1 | Wins |
|---|---|---|---|
piibench-deberta-base |
Pritesh-2711/piibench-deberta-base | 0.6455 | 54/82 entity types |
piibench-deberta-sch |
Pritesh-2711/piibench-deberta-sch | 0.5894 | 28/82 entity types (HTTP_COOKIE, DATE_TIME, ...) |
Both models cover all 82 entity types across 10 coarse categories (CREDENTIAL, FINANCIAL_ID, CONTACT, NETWORK, LOCATION, PERSON_GROUP, ORG_ROLE, TEMPORAL, MISC, FINANCIAL_NER).
Installation
# Core library only (regex detector, no model)
pip install anonypii
# With model support (torch, transformers, huggingface-hub)
pip install anonypii[model]
# With model support + auto-download of both models
pip install anonypii[models]
# With pandas DataFrame support
pip install anonypii[pandas]
# Everything
pip install anonypii[all]
Download models manually
anonypii download all # both models
anonypii download piibench-deberta-base # recommended only
anonypii download piibench-deberta-sch # SC+H only
Or at runtime:
from anonypii.detectors.model import ModelPIIDetector
detector = ModelPIIDetector(model="piibench-deberta-base", download=True)
Quick start
Irreversible masking
from anonypii import Anonymizer
anon = Anonymizer(model="piibench-deberta-base", download=True)
anon.mask("My email is john@example.com")
# "My email is <EMAIL>"
anon.mask("SSN: 123-45-6789 and card 4111-1111-1111-1111")
# "SSN: <SSN> and card <CREDIT_CARD>"
Reversible anonymization
from anonypii import Anonymizer
anon = Anonymizer(model="piibench-deberta-base", download=True)
result = anon.anonymize("My email is john@example.com")
print(result.text) # "My email is {{EMAIL_001}}"
print(result.restore()) # "My email is john@example.com"
Stateful reversible anonymizer
from anonypii import ReversibleAnonymizer
ra = ReversibleAnonymizer(model="piibench-deberta-base", download=True)
r = ra.anonymize("Contact alice@corp.com or call 555-123-4567")
print(r.text) # "Contact {{EMAIL_001}} or call {{PHONE_001}}"
print(ra.restore(r.text)) # original text restored
Using the regex detector (no download needed)
from anonypii import Anonymizer
from anonypii.detectors.regex import RegexPIIDetector
anon = Anonymizer(detector=RegexPIIDetector())
print(anon.mask("john@example.com / 123-45-6789"))
# "<EMAIL> / <SSN>"
Masking strategies
from anonypii.masking.strategies import (
TagMaskingStrategy, # <EMAIL> (default for mask())
RedactedMaskingStrategy, # [REDACTED]
StarMaskingStrategy, # j**************m
TokenMaskingStrategy, # {{EMAIL_001}} (default for anonymize())
)
# Star masking: keep first and last character
from anonypii.masking.strategies import StarMaskingStrategy
anon = Anonymizer(detector=..., strategy=StarMaskingStrategy(keep_start=1, keep_end=1))
Entity configuration
Restrict detection to a subset of entities via a YAML or JSON config file:
# my_config.yaml
schema_version: "1.0"
active_entity_types:
- EMAIL
- SSN
- CREDIT_CARD
# Or activate entire coarse groups:
active_coarse_groups:
- CREDENTIAL
- FINANCIAL_ID
anon = Anonymizer(config_path="my_config.yaml", ...)
Allowlist
Suppress known-safe values from detection results:
import re
from anonypii.detectors.regex import RegexPIIDetector
detector = RegexPIIDetector(
allowlist=[
"noreply@company.com", # exact literal
re.compile(r".*@internal\.com$"), # regex pattern
]
)
Vault options
from anonypii.vault.memory import InMemoryVault # default, session-only
from anonypii.vault.memory import ThreadSafeInMemoryVault # thread-safe variant
from anonypii.vault.json_file import JsonFileVault # persistent across sessions
ra = ReversibleAnonymizer(
detector=...,
vault=JsonFileVault("~/.anonypii/vault.json"),
)
DataFrame processing
import pandas as pd
from anonypii import Anonymizer
from anonypii.io.dataframe import process_dataframe
df = pd.DataFrame({"email": ["alice@x.com"], "notes": ["SSN 123-45-6789"]})
redacted_df, results = process_dataframe(df, Anonymizer(...))
CLI
anonypii detect "My email is john@example.com"
anonypii mask "My email is john@example.com"
anonypii anonymize "My email is john@example.com" --output-mapping mapping.json
anonypii restore "My email is {{EMAIL_001}}" --mapping mapping.json
anonypii info
anonypii download all
Entity types (82 total)
| Coarse group | Entity types |
|---|---|
| CREDENTIAL | SSN, PASSWORD, API_KEY, PIN, PASSPORT_NUMBER, DRIVER_LICENSE, TAX_ID, NATIONAL_ID, ... |
| FINANCIAL_ID | CREDIT_CARD, IBAN, ACCOUNT_NUMBER, BANK_ROUTING_NUMBER, BIC, SWIFT_BIC, CVV, ... |
| CONTACT | EMAIL, PHONE, PHONE_NUMBER, FAX_NUMBER |
| NETWORK | IP_ADDRESS, IPV4, IPV6, MAC_ADDRESS, URL, USERNAME, HTTP_COOKIE, DEVICE_IDENTIFIER |
| PERSON_GROUP | PERSON, FIRST_NAME, LAST_NAME, NAME, AGE, GENDER |
| LOCATION | ADDRESS, CITY, STATE, COUNTRY, POSTCODE, COORDINATE, STREET_ADDRESS, ... |
| ORG_ROLE | ORG, COMPANY, COMPANY_NAME, JOB, OCCUPATION |
| TEMPORAL | DATE, TIME, DATE_TIME, DATE_OF_BIRTH |
| MISC | CRYPTO_ADDRESS, VEHICLE, CURRENCY, AMOUNT, BLOOD_TYPE, LICENSE_PLATE, ... |
| FINANCIAL_NER | FINANCIAL_ENTITY |
Research
The underlying models are described in:
- Dataset: PIIBench: A Unified Multi-Source Benchmark Corpus for PII Detection — Jha (2026)
- Models: Fine-Tuning Over Architectural Complexity: PII Detection on PIIBench with DeBERTa — Jha (2026)
License
Apache License 2.0 — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file anonypii-0.1.0.tar.gz.
File metadata
- Download URL: anonypii-0.1.0.tar.gz
- Upload date:
- Size: 51.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fb9d633110c10cb0ad95341753494ce1ee23d325ea7450dff02c39e39385fe6c
|
|
| MD5 |
0009e7406c8df8c13b27ed11ee131874
|
|
| BLAKE2b-256 |
36afc326020a3ddb319852d7e802632101d6a1cd6ed01605336ec5d450060238
|
File details
Details for the file anonypii-0.1.0-py3-none-any.whl.
File metadata
- Download URL: anonypii-0.1.0-py3-none-any.whl
- Upload date:
- Size: 48.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
32e6b2f75b4e4fb4471dcfc092ff18a858936e01d66ea468c046396c54c0a842
|
|
| MD5 |
183d67237f3c06437bfa44d4ebd996b6
|
|
| BLAKE2b-256 |
1732d105623cef73e09f29cebdeb1307031db6ccea7169f5c14372fc50b9a647
|