Advanced PII pseudonymization for LLM context preservation.
Project description
🛡️ Privalyse Mask: Privacy Layer for LLMs & RAG
Make your LLM & RAG pipelines privacy-aware in 3 lines of code.
privalyse-mask is the middleware for privacy-first AI. It pseudonymizes sensitive data (PII) before it hits OpenAI, Anthropic, or your Vector DB, and restores it after the response—preserving full context for the model.
- Peter →
{Name_s73nd}(Preserves entity type and uniqueness) - 12.10.2000 →
{Date_October_2000}(Preserves temporal context) - DE93... →
{German_IBAN}(Preserves financial context)
⚡ Quick Start
from privalyse_mask import PrivalyseMasker
# 1. Mask PII (Peter -> {Name_x92}, Berlin -> {City_B})
masker = PrivalyseMasker()
safe_prompt, mapping = masker.mask("Peter lives in Berlin and uses IBAN DE12...")
# 2. Run LLM (The model sees structure, not secrets)
# ... llm.invoke(safe_prompt) ...
# 3. Unmask (Restore original data for the user)
final_response = masker.unmask(llm_response, mapping)
🎯 Why Privalyse Mask?
- RAG & Chatbots: Perfect for vector search and conversational AI.
- Context-Aware: Unlike
*****, we preserve gender, nationality, and formats so the LLM stays smart. - Zero Leakage: Your raw data never leaves your infrastructure.
flowchart LR
A["User Input<br/>(PII)"] -->|Mask| B("Privalyse Mask")
B -->|"Safe Prompt"| C["LLM"]
C -->|"Safe Response"| D("Privalyse Unmask")
D -->|"Final Response"| E["User"]
style B fill:#e6f3ff,stroke:#2196f3,stroke-width:2px,color:#000
style D fill:#e6f3ff,stroke:#2196f3,stroke-width:2px,color:#000
style A fill:#fff,stroke:#333,color:#000
style C fill:#fff,stroke:#333,color:#000
style E fill:#fff,stroke:#333,color:#000
🚀 Installation
pip install privalyse-mask
python -m spacy download en_core_web_lg
🛠️ Usage
The core workflow is simple: mask the input, send to LLM, and unmask the response.
from privalyse_mask import PrivalyseMasker
# Initialize the masker
masker = PrivalyseMasker()
# Your sensitive input
user_input = """
My name is Peter. I was born on 12.10.2000.
My IBAN is DE93 3432 2346 4355.
"""
# 1. Mask the data
masked_text, mapping = masker.mask(user_input)
print(f"Masked: {masked_text}")
# Output:
# "My name is {Name_s73nd}. I was born on {Date_October_2000}.
# My IBAN is {German_IBAN}."
# 2. Send to LLM (Simulation)
# llm_response = openai.ChatCompletion.create(..., messages=[{"role": "user", "content": masked_text}])
llm_response_text = f"Hello {masked_text.split()[3]}, I see your bank account is {masked_text.split()[-1]}."
# 3. Unmask the response
final_response = masker.unmask(llm_response_text, mapping)
print(f"Response: {final_response}")
# Output:
# "Hello Peter, I see your bank account is DE93 3432 2346 4355."
🔌 Integrations
Using with OpenAI SDK
Protect your prompts before they leave your server.
from openai import OpenAI
from privalyse_mask import PrivalyseMasker
client = OpenAI()
masker = PrivalyseMasker()
prompt = "My email is alice@example.com"
masked_prompt, mapping = masker.mask(prompt)
# Send safe prompt to OpenAI
response = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": masked_prompt}]
)
# Restore PII in the response
safe_response = masker.unmask(response.choices[0].message.content, mapping)
Using with LangChain
Easily integrate into your chains.
from langchain.prompts import PromptTemplate
from privalyse_mask import PrivalyseMasker
masker = PrivalyseMasker()
def safe_invoke(chain, input_text):
masked_text, mapping = masker.mask(input_text)
response = chain.invoke(masked_text)
return masker.unmask(response, mapping)
Secure Tool Calling
When LLMs call tools (e.g., database lookups), they will use the masked values (e.g., {Name_x92}). You must unmask these arguments before executing the tool.
See examples/tool_calling_example.py for a full implementation pattern.
🧩 Features
- Context-Aware Masking: Dates are generalized to Month/Year. IDs are mapped to their type and origin.
- Reversible: A secure mapping object allows for perfect reconstruction of the LLM's response.
- Stateless & Secure: No data is stored persistently; mappings are ephemeral per request.
- Extensible: Built on top of Microsoft Presidio and Spacy.
�️ Roadmap
We are building the standard for privacy-preserving AI.
- ✅ Multi-language Support (EN, DE supported, more coming)
- ✅ Custom Masking Rules (Add your own Regex/Logic)
- 🔄 LangChain Integration Helper (In Progress)
- 🔜 Streaming Support (Critical for Chatbots)
- 🔜 PII-Presidio Adapter (Easy migration)
🌟 Vision
We believe in maximizing the utility of LLMs without compromising user privacy. By mapping sensitive data to context-rich placeholders, we allow models to understand the structure and nature of the data without seeing the actual data.
�📦 License
MIT License. See LICENSE for details.
👩💻 Developer Guide
Architecture & Core Concepts
- Entry Point:
PrivalyseMaskerin src/privalyse_mask/core.py is the main class. - Analysis: Uses
presidio-analyzerto detect entities. - Custom Recognizers: Extends Presidio with custom patterns (e.g., German ID, Spaced IBAN) in src/privalyse_mask/recognizers.py.
- Masking Logic:
- Surrogates: Replaces entities with
{Type_Context}or{Type_Hash}placeholders. - Reversibility:
mask()returns amappingdict (Surrogate → Original) to allowunmask()to restore the original text. - Selective Masking: Some entities (like generic Locations/Cities) are intentionally not masked (surrogate generator returns
None) to preserve context.
- Surrogates: Replaces entities with
- Data Flow:
Input Text→Analyzer→Entity Detection→Overlap Removal→Surrogate Generation→Replacement→Masked Text + Mapping.
Developer Workflow
- Installation:
pip install -e . python -m spacy download en_core_web_lg # Required for Presidio
- Testing:
- Run tests with
pytest. - Tests are located in tests/.
- Run tests with
- Adding Recognizers:
- Define
PatternandPatternRecognizerin src/privalyse_mask/recognizers.py. - Register it in
PrivalyseMasker.__init__in src/privalyse_mask/core.py.
- Define
Conventions & Patterns
- Surrogate Format: Always use curly braces
{...}.- Person:
{Name_<hash>} - Date:
{Date_<Month>_<Year>}(viadateparser) - IBAN:
{<Country>_IBAN} - Email:
{Email_at_<domain>}
- Person:
- Hashing: Use
utils.generate_hash_suffixwith the instance'sseedfor consistent but secure hashes. - Overlap Handling: Custom greedy strategy in
_remove_overlaps(Score > Length). - Structure Masking:
mask_structhandles recursive masking of JSON/Dict objects.
Common Pitfalls
- Spacy Model: Ensure
en_core_web_lgis installed; otherwiseAnalyzerEngineinitialization fails. - Presidio Overlaps: Presidio often returns overlapping entities;
_remove_overlapsis critical. - Date Parsing:
dateparseris used to extract semantic date info; fallback to{Date_General}if parsing fails.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file privalyse_mask-0.1.1.tar.gz.
File metadata
- Download URL: privalyse_mask-0.1.1.tar.gz
- Upload date:
- Size: 36.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fda108daef81d100eaa7baac91b9a516aaeea63990b7f6bd0274c66af544cae8
|
|
| MD5 |
fcc4ddfb5d53ac3617f0275711b9123a
|
|
| BLAKE2b-256 |
c80042115d5226e5f712adcb36e6cdab8a79368d3f358b3430fc11a40cdeb7fa
|
Provenance
The following attestation bundles were made for privalyse_mask-0.1.1.tar.gz:
Publisher:
release.yml on Privalyse/privalyse-mask
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
privalyse_mask-0.1.1.tar.gz -
Subject digest:
fda108daef81d100eaa7baac91b9a516aaeea63990b7f6bd0274c66af544cae8 - Sigstore transparency entry: 779590322
- Sigstore integration time:
-
Permalink:
Privalyse/privalyse-mask@a3a0159ea7af83f2ccfe02438a1f68b0caf93734 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/Privalyse
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@a3a0159ea7af83f2ccfe02438a1f68b0caf93734 -
Trigger Event:
release
-
Statement type:
File details
Details for the file privalyse_mask-0.1.1-py3-none-any.whl.
File metadata
- Download URL: privalyse_mask-0.1.1-py3-none-any.whl
- Upload date:
- Size: 11.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2fe2b2ebca8fb6379fab8b05f902aa6b0b2cb3e0d364926dca31cd616f4a2b7e
|
|
| MD5 |
e466e2c603dfe93273d5976fb826f008
|
|
| BLAKE2b-256 |
3996042124d6183a513ed26b57ce493ff3f7ad45187a4c7629d9efa6bdcd5e20
|
Provenance
The following attestation bundles were made for privalyse_mask-0.1.1-py3-none-any.whl:
Publisher:
release.yml on Privalyse/privalyse-mask
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
privalyse_mask-0.1.1-py3-none-any.whl -
Subject digest:
2fe2b2ebca8fb6379fab8b05f902aa6b0b2cb3e0d364926dca31cd616f4a2b7e - Sigstore transparency entry: 779590323
- Sigstore integration time:
-
Permalink:
Privalyse/privalyse-mask@a3a0159ea7af83f2ccfe02438a1f68b0caf93734 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/Privalyse
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@a3a0159ea7af83f2ccfe02438a1f68b0caf93734 -
Trigger Event:
release
-
Statement type: