Skip to main content

Production-grade PII detection, masking, and reversible anonymization library backed by fine-tuned DeBERTa models.

Project description

anonypii

Production-grade PII detection, masking, and reversible anonymization for Python.

Backed by two fine-tuned DeBERTa-v3-base models from the PIIBench research:

Model HuggingFace Full-test F1 Wins
piibench-deberta-base Pritesh-2711/piibench-deberta-base 0.6455 54/82 entity types
piibench-deberta-sch Pritesh-2711/piibench-deberta-sch 0.5894 28/82 entity types (HTTP_COOKIE, DATE_TIME, ...)

Both models cover all 82 entity types across 10 coarse categories (CREDENTIAL, FINANCIAL_ID, CONTACT, NETWORK, LOCATION, PERSON_GROUP, ORG_ROLE, TEMPORAL, MISC, FINANCIAL_NER).


Installation

# Core library only (regex detector, no model)
pip install anonypii

# With model support (torch, transformers, huggingface-hub)
pip install anonypii[model]

# With model support + auto-download of both models
pip install anonypii[models]

# With pandas DataFrame support
pip install anonypii[pandas]

# Everything
pip install anonypii[all]

Download models manually

anonypii download all                    # both models
anonypii download piibench-deberta-base  # recommended only
anonypii download piibench-deberta-sch   # SC+H only

Or at runtime:

from anonypii.detectors.model import ModelPIIDetector
detector = ModelPIIDetector(model="piibench-deberta-base", download=True)

Quick start

Irreversible masking

from anonypii import Anonymizer

anon = Anonymizer(model="piibench-deberta-base", download=True)

anon.mask("My email is john@example.com")
# "My email is <EMAIL>"

anon.mask("SSN: 123-45-6789 and card 4111-1111-1111-1111")
# "SSN: <SSN> and card <CREDIT_CARD>"

Reversible anonymization

from anonypii import Anonymizer

anon = Anonymizer(model="piibench-deberta-base", download=True)

result = anon.anonymize("My email is john@example.com")
print(result.text)     # "My email is {{EMAIL_001}}"
print(result.restore()) # "My email is john@example.com"

Stateful reversible anonymizer

from anonypii import ReversibleAnonymizer

ra = ReversibleAnonymizer(model="piibench-deberta-base", download=True)

r = ra.anonymize("Contact alice@corp.com or call 555-123-4567")
print(r.text)           # "Contact {{EMAIL_001}} or call {{PHONE_001}}"
print(ra.restore(r.text)) # original text restored

Using the regex detector (no download needed)

from anonypii import Anonymizer
from anonypii.detectors.regex import RegexPIIDetector

anon = Anonymizer(detector=RegexPIIDetector())
print(anon.mask("john@example.com / 123-45-6789"))
# "<EMAIL> / <SSN>"

Masking strategies

from anonypii.masking.strategies import (
    TagMaskingStrategy,        # <EMAIL>          (default for mask())
    RedactedMaskingStrategy,   # [REDACTED]
    StarMaskingStrategy,       # j**************m
    TokenMaskingStrategy,      # {{EMAIL_001}}    (default for anonymize())
)

# Star masking: keep first and last character
from anonypii.masking.strategies import StarMaskingStrategy
anon = Anonymizer(detector=..., strategy=StarMaskingStrategy(keep_start=1, keep_end=1))

Entity configuration

Restrict detection to a subset of entities via a YAML or JSON config file:

# my_config.yaml
schema_version: "1.0"

active_entity_types:
  - EMAIL
  - SSN
  - CREDIT_CARD

# Or activate entire coarse groups:
active_coarse_groups:
  - CREDENTIAL
  - FINANCIAL_ID
anon = Anonymizer(config_path="my_config.yaml", ...)

Allowlist

Suppress known-safe values from detection results:

import re
from anonypii.detectors.regex import RegexPIIDetector

detector = RegexPIIDetector(
    allowlist=[
        "noreply@company.com",              # exact literal
        re.compile(r".*@internal\.com$"),   # regex pattern
    ]
)

Vault options

from anonypii.vault.memory import InMemoryVault           # default, session-only
from anonypii.vault.memory import ThreadSafeInMemoryVault # thread-safe variant
from anonypii.vault.json_file import JsonFileVault        # persistent across sessions

ra = ReversibleAnonymizer(
    detector=...,
    vault=JsonFileVault("~/.anonypii/vault.json"),
)

DataFrame processing

import pandas as pd
from anonypii import Anonymizer
from anonypii.io.dataframe import process_dataframe

df = pd.DataFrame({"email": ["alice@x.com"], "notes": ["SSN 123-45-6789"]})
redacted_df, results = process_dataframe(df, Anonymizer(...))

CLI

anonypii detect  "My email is john@example.com"
anonypii mask    "My email is john@example.com"
anonypii anonymize "My email is john@example.com" --output-mapping mapping.json
anonypii restore   "My email is {{EMAIL_001}}"     --mapping mapping.json
anonypii info
anonypii download all

Entity types (82 total)

Coarse group Entity types
CREDENTIAL SSN, PASSWORD, API_KEY, PIN, PASSPORT_NUMBER, DRIVER_LICENSE, TAX_ID, NATIONAL_ID, ...
FINANCIAL_ID CREDIT_CARD, IBAN, ACCOUNT_NUMBER, BANK_ROUTING_NUMBER, BIC, SWIFT_BIC, CVV, ...
CONTACT EMAIL, PHONE, PHONE_NUMBER, FAX_NUMBER
NETWORK IP_ADDRESS, IPV4, IPV6, MAC_ADDRESS, URL, USERNAME, HTTP_COOKIE, DEVICE_IDENTIFIER
PERSON_GROUP PERSON, FIRST_NAME, LAST_NAME, NAME, AGE, GENDER
LOCATION ADDRESS, CITY, STATE, COUNTRY, POSTCODE, COORDINATE, STREET_ADDRESS, ...
ORG_ROLE ORG, COMPANY, COMPANY_NAME, JOB, OCCUPATION
TEMPORAL DATE, TIME, DATE_TIME, DATE_OF_BIRTH
MISC CRYPTO_ADDRESS, VEHICLE, CURRENCY, AMOUNT, BLOOD_TYPE, LICENSE_PLATE, ...
FINANCIAL_NER FINANCIAL_ENTITY

Research

The underlying models are described in:


License

Apache License 2.0 — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

anonypii-0.1.0.tar.gz (51.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

anonypii-0.1.0-py3-none-any.whl (48.6 kB view details)

Uploaded Python 3

File details

Details for the file anonypii-0.1.0.tar.gz.

File metadata

  • Download URL: anonypii-0.1.0.tar.gz
  • Upload date:
  • Size: 51.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for anonypii-0.1.0.tar.gz
Algorithm Hash digest
SHA256 fb9d633110c10cb0ad95341753494ce1ee23d325ea7450dff02c39e39385fe6c
MD5 0009e7406c8df8c13b27ed11ee131874
BLAKE2b-256 36afc326020a3ddb319852d7e802632101d6a1cd6ed01605336ec5d450060238

See more details on using hashes here.

File details

Details for the file anonypii-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: anonypii-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 48.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for anonypii-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 32e6b2f75b4e4fb4471dcfc092ff18a858936e01d66ea468c046396c54c0a842
MD5 183d67237f3c06437bfa44d4ebd996b6
BLAKE2b-256 1732d105623cef73e09f29cebdeb1307031db6ccea7169f5c14372fc50b9a647

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page