Reversible PII anonymization for LLM workflows. Round-trip with persisted mapping; CLI included.
Project description
pii-veil
Reversible PII anonymization for LLM workflows. Replace PII with stable tokens, send to an LLM, then deanonymize the response using the persisted mapping.
Built on pii-core for detection. Detector-agnostic: any pii_core.Detector plugs in.
Install
pip install pii-veil
Quick usage
from pii_veil import Shield
shield = Shield()
result = shield.anonymize("Mój PESEL: 44051401358, kontakt: jan@example.pl.")
# result.text -> "Mój PESEL: [PL_PESEL_001], kontakt: [EMAIL_001]."
# result.mapping persists the reversible mapping
# ... send result.text to an LLM, get a response back ...
restored = shield.deanonymize(llm_response)
The same value gets the same token within a Shield's lifetime, so an LLM that quotes a token back gets resolved to the original. Persist the mapping JSON if you need round-trips across processes:
mapping_json = result.mapping.to_json()
# later, in a different process:
from pii_veil import Mapping, Shield
loaded = Shield(mapping=Mapping.from_json(mapping_json))
loaded.deanonymize(text_from_llm)
CLI
pii-veil anonymize input.txt -o anon.txt -m mapping.json
pii-veil deanonymize anon.txt -m mapping.json -o restored.txt
pii-veil detect input.txt --format json
- as the input path means stdin. deanonymize -o - (or omitting -o) writes to stdout. UTF-8 (with or without BOM) and UTF-16 (with BOM) are accepted on read; output is always UTF-8 without BOM.
Custom detectors
from pii_core import PlPeselDetector, EmailDetector
from pii_veil import Shield
# Only PESEL and email; everything else passes through.
shield = Shield(detectors=[PlPeselDetector(), EmailDetector()])
Detector order is the overlap-resolution priority tiebreak: when two detectors emit identical spans, the one earlier in the list wins. Different lengths are resolved by "longest match wins".
Hardening for untrusted input
shield = Shield(max_input_bytes=1_000_000) # 1 MiB cap; raises InputSizeError above
shield.reset() # clear accumulated mapping between unrelated documents
Shield.anonymize is O(n) in input size and not thread-safe; use one Shield per request, and reset() between unrelated documents to prevent token-shape collisions across users.
API stability
The public surface (Shield, Mapping, AnonymizeResult, Match, PIIType, the four exception classes) is SemVer-stable. Mapping JSON has a schema_version field; the loader rejects unknown versions rather than guessing.
Sibling packages
pii-core— multi-language detection primitives.pii-presidio— Microsoft Presidio plugin with its own optional reversible operator.
License
Apache-2.0. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pii_veil-0.1.0.tar.gz.
File metadata
- Download URL: pii_veil-0.1.0.tar.gz
- Upload date:
- Size: 27.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d26bfd0965222d8621cf1ca02c418556a4e8135b933db774688c8044a2cf5140
|
|
| MD5 |
db6c26272138389946fc3abd85c5aa57
|
|
| BLAKE2b-256 |
6fcda943327841a858b4af99ac4613ec3e04d054e1793b486f791eb4a43357bb
|
Provenance
The following attestation bundles were made for pii_veil-0.1.0.tar.gz:
Publisher:
publish.yml on pii-toolkit/pii-veil
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pii_veil-0.1.0.tar.gz -
Subject digest:
d26bfd0965222d8621cf1ca02c418556a4e8135b933db774688c8044a2cf5140 - Sigstore transparency entry: 1401704272
- Sigstore integration time:
-
Permalink:
pii-toolkit/pii-veil@c6a432411bdcf19923a0dfec825067224679a8f3 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/pii-toolkit
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@c6a432411bdcf19923a0dfec825067224679a8f3 -
Trigger Event:
push
-
Statement type:
File details
Details for the file pii_veil-0.1.0-py3-none-any.whl.
File metadata
- Download URL: pii_veil-0.1.0-py3-none-any.whl
- Upload date:
- Size: 18.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c5f7bbd97a1ca7329fba04b928a196d54622c93d64a5f13b9f104843dccca0a5
|
|
| MD5 |
1672463df87cb50794685bd1892d60d2
|
|
| BLAKE2b-256 |
8640bca704f55798e66e40517a0ee424b63b5b8d3fda070d74d797f29f0b0442
|
Provenance
The following attestation bundles were made for pii_veil-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on pii-toolkit/pii-veil
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pii_veil-0.1.0-py3-none-any.whl -
Subject digest:
c5f7bbd97a1ca7329fba04b928a196d54622c93d64a5f13b9f104843dccca0a5 - Sigstore transparency entry: 1401704331
- Sigstore integration time:
-
Permalink:
pii-toolkit/pii-veil@c6a432411bdcf19923a0dfec825067224679a8f3 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/pii-toolkit
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@c6a432411bdcf19923a0dfec825067224679a8f3 -
Trigger Event:
push
-
Statement type: