Skip to main content

Microsoft Presidio plugin: multi-language recognizers with optional reversible anonymization.

Project description

pii-presidio

Microsoft Presidio plugin: multi-language PII recognizers with reversible anonymization, built on pii-core and pii-veil.

Install

pip install pii-presidio
python -m spacy download pl_core_news_sm   # required for Polish NLP analysis

pii-presidio pulls in presidio-analyzer, presidio-anonymizer, pii-core, and pii-veil. spaCy itself comes via Presidio; the Polish language model has to be downloaded separately (Presidio's standard pattern).

Recognizers

from pii_presidio import get_recognizers
from presidio_analyzer import AnalyzerEngine, RecognizerRegistry
from presidio_analyzer.nlp_engine import NlpEngineProvider

nlp_engine = NlpEngineProvider(nlp_configuration={
    "nlp_engine_name": "spacy",
    "models": [{"lang_code": "pl", "model_name": "pl_core_news_sm"}],
}).create_engine()

registry = RecognizerRegistry(supported_languages=["pl"])
for r in get_recognizers(["pl"]):
    registry.add_recognizer(r)

analyzer = AnalyzerEngine(registry=registry, nlp_engine=nlp_engine, supported_languages=["pl"])
results = analyzer.analyze(text="PESEL 44051401359, email jan@example.pl", language="pl")

Each pii_core detector becomes one PatternRecognizer. Confidence scores are 0.85 for checksum-validated detectors (PESEL, NIP, REGON, IBAN, credit card) and 0.4 for regex-only ones (ID card, passport, phone, email). Per-detector context words are pre-set to common Polish keywords; pass your own via PiiCoreRecognizer(detector, context=[...]) if you need different boosts.

KRS and postal-code detectors are excluded by default (their raw regexes match ordinary 10-digit and XX-XXX strings); enable them with include_opt_in=True and pair with strict context filtering.

Reversible anonymization

from pii_veil import Mapping, Shield
from pii_presidio import ReversibleReplaceOperator, reversible_operators
from presidio_anonymizer import AnonymizerEngine

mapping = Mapping()
engine = AnonymizerEngine()
engine.add_anonymizer(ReversibleReplaceOperator)

result = engine.anonymize(
    text="PESEL 44051401359, email jan@example.pl",
    analyzer_results=results,
    operators=reversible_operators(mapping),
)
# result.text -> "PESEL [PL_PESEL_001], email [EMAIL_001]"

# Send result.text to an LLM, get a response back, then:
restored = Shield(mapping=mapping).deanonymize(llm_response_text)

The Mapping is the round-trip handle. It uses the same JSON format as standalone pii-veil, so you can interleave the two -- anonymize via Presidio, deanonymize via Shield, or vice versa.

Entity name mapping

pii_core.PIIType Presidio entity name
PL_PESEL, PL_NIP, PL_REGON, PL_ID_CARD, PL_PASSPORT, PL_KRS, PL_POSTAL_CODE same string (country-prefixed)
PL_PHONE PHONE_NUMBER
PL_IBAN IBAN_CODE
EMAIL EMAIL_ADDRESS
CREDIT_CARD CREDIT_CARD

Cross-language types use Presidio's standard names so existing pipelines that filter entities=["EMAIL_ADDRESS"] pick our recognizers up unchanged.

Sibling packages

  • pii-core -- multi-language detection primitives this plugin reuses.
  • pii-veil -- non-Presidio reversible anonymization with the same Mapping format.

License

Apache-2.0. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pii_presidio-0.1.0.tar.gz (15.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pii_presidio-0.1.0-py3-none-any.whl (12.9 kB view details)

Uploaded Python 3

File details

Details for the file pii_presidio-0.1.0.tar.gz.

File metadata

  • Download URL: pii_presidio-0.1.0.tar.gz
  • Upload date:
  • Size: 15.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pii_presidio-0.1.0.tar.gz
Algorithm Hash digest
SHA256 742a1938b1f25477f8edca52d87c58a5d1ca6381b8f7b9bda585becaa4e57ddc
MD5 4243d557fd5e50bd570a2d68bbf751af
BLAKE2b-256 9fd867c080098ed13af867efce4c34b127df50a073ba3b6cc5757803e331e7ee

See more details on using hashes here.

Provenance

The following attestation bundles were made for pii_presidio-0.1.0.tar.gz:

Publisher: publish.yml on pii-toolkit/pii-presidio

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pii_presidio-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: pii_presidio-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 12.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pii_presidio-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a9dd5843d73e529650c8b4c8fbc2e248e6524db9bf033823eefe5ba6146f0583
MD5 06ab94ea81e4cf6ee261ea8be4f545b1
BLAKE2b-256 25e55c5f571fdfa58fe89cc94585bbef12ec14087f02ae2393a0f8230a449489

See more details on using hashes here.

Provenance

The following attestation bundles were made for pii_presidio-0.1.0-py3-none-any.whl:

Publisher: publish.yml on pii-toolkit/pii-presidio

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page