Skip to main content

Reversible PII anonymization for LLM workflows. Round-trip with persisted mapping; CLI included.

Project description

pii-veil

Reversible PII anonymization for LLM workflows. Replace PII with stable tokens, send to an LLM, then deanonymize the response using the persisted mapping.

Built on pii-core for detection. Detector-agnostic: any pii_core.Detector plugs in.

Install

pip install pii-veil

Quick usage

from pii_veil import Shield

shield = Shield()
result = shield.anonymize("Mój PESEL: 44051401358, kontakt: jan@example.pl.")
# result.text   -> "Mój PESEL: [PL_PESEL_001], kontakt: [EMAIL_001]."
# result.mapping persists the reversible mapping

# ... send result.text to an LLM, get a response back ...
restored = shield.deanonymize(llm_response)

The same value gets the same token within a Shield's lifetime, so an LLM that quotes a token back gets resolved to the original. Persist the mapping JSON if you need round-trips across processes:

mapping_json = result.mapping.to_json()
# later, in a different process:
from pii_veil import Mapping, Shield
loaded = Shield(mapping=Mapping.from_json(mapping_json))
loaded.deanonymize(text_from_llm)

CLI

pii-veil anonymize input.txt -o anon.txt -m mapping.json
pii-veil deanonymize anon.txt -m mapping.json -o restored.txt
pii-veil detect input.txt --format json

- as the input path means stdin. deanonymize -o - (or omitting -o) writes to stdout. UTF-8 (with or without BOM) and UTF-16 (with BOM) are accepted on read; output is always UTF-8 without BOM.

Custom detectors

from pii_core import PlPeselDetector, EmailDetector
from pii_veil import Shield

# Only PESEL and email; everything else passes through.
shield = Shield(detectors=[PlPeselDetector(), EmailDetector()])

Detector order is the overlap-resolution priority tiebreak: when two detectors emit identical spans, the one earlier in the list wins. Different lengths are resolved by "longest match wins".

Hardening for untrusted input

shield = Shield(max_input_bytes=1_000_000)  # 1 MiB cap; raises InputSizeError above
shield.reset()  # clear accumulated mapping between unrelated documents

Shield.anonymize is O(n) in input size and not thread-safe; use one Shield per request, and reset() between unrelated documents to prevent token-shape collisions across users.

API stability

The public surface (Shield, Mapping, AnonymizeResult, Match, PIIType, the four exception classes) is SemVer-stable. Mapping JSON has a schema_version field; the loader rejects unknown versions rather than guessing.

Sibling packages

  • pii-core — multi-language detection primitives.
  • pii-presidio — Microsoft Presidio plugin with its own optional reversible operator.

License

Apache-2.0. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pii_veil-0.1.0.tar.gz (27.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pii_veil-0.1.0-py3-none-any.whl (18.6 kB view details)

Uploaded Python 3

File details

Details for the file pii_veil-0.1.0.tar.gz.

File metadata

  • Download URL: pii_veil-0.1.0.tar.gz
  • Upload date:
  • Size: 27.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pii_veil-0.1.0.tar.gz
Algorithm Hash digest
SHA256 d26bfd0965222d8621cf1ca02c418556a4e8135b933db774688c8044a2cf5140
MD5 db6c26272138389946fc3abd85c5aa57
BLAKE2b-256 6fcda943327841a858b4af99ac4613ec3e04d054e1793b486f791eb4a43357bb

See more details on using hashes here.

Provenance

The following attestation bundles were made for pii_veil-0.1.0.tar.gz:

Publisher: publish.yml on pii-toolkit/pii-veil

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pii_veil-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: pii_veil-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 18.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pii_veil-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c5f7bbd97a1ca7329fba04b928a196d54622c93d64a5f13b9f104843dccca0a5
MD5 1672463df87cb50794685bd1892d60d2
BLAKE2b-256 8640bca704f55798e66e40517a0ee424b63b5b8d3fda070d74d797f29f0b0442

See more details on using hashes here.

Provenance

The following attestation bundles were made for pii_veil-0.1.0-py3-none-any.whl:

Publisher: publish.yml on pii-toolkit/pii-veil

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page