Advanced PII pseudonymization for LLM context preservation.
Project description
🛡️ Privalyse Mask
Redefining Privacy in AI-Applications.
The Privacy-Protection Layer for your LLM Pipeline.
💥 Stop choosing between Privacy and High-Quality Answers.
Privalyse Mask is the missing link that makes LLMs GDPR-compliant without making them stupid.
Most tools destroy data to save it. We don't. We transform sensitive PII into Semantic Surrogates—tokens that preserve gender, culture, geography, and structure—so your AI still "gets it" while the data stays safe.
Zero Leaks. Full Context. 100% Reversible.
⭐ Star this repository if you believe in Privacy-First AI!
🧠 The Dilemma: Utility vs. Privacy
When sending data to an LLM, you usually have two bad options:
- Send Everything: You risk GDPR fines and data leaks.
- Redact Everything: The LLM becomes stupid. "John from Berlin" becomes
[PERSON] from [LOCATION]. The model loses gender, culture, and geography.
💡 The Solution: Semantic Masking
Privalyse Mask solves this by replacing sensitive entities with Context-Aware, Reversible Surrogates. We preserve the meaning while hiding the identity.
| Original Input | Standard Redaction | Privalyse Mask |
|---|---|---|
| "John Smith lives at 123 Main St, New York." | [PERSON] lives at [ADDRESS]. |
"{User_61173_Prename_John} lives at {Address_in_New York_Street_cb7e6}." |
| "Max Mustermann wohnt in Berlin." | [PERSON] wohnt in [LOCATION]. |
"{User_44aa4_Prename_Max} wohnt in {Address_in_Berlin}." |
| "Call me at +49 30 123456." | Call me at [PHONE]. |
Call me at {Phone_DE}. |
✅ The Model Understands: "This is a male person named John living in NYC." ❌ The Model Doesn't Know: Who exactly it is or where exactly they live.
⚡ Usage in 3 Lines
from privalyse_mask import PrivalyseMasker
# Automatically loads EN, DE, FR, ES, IT models
masker = PrivalyseMasker()
masked_text, mapping = masker.mask("John lives in Berlin.")
# Result: "{User_a1b2_Prename_John} lives in {Address_in_Berlin}."
✨ Why Privalyse Mask?
1. 🌍 True Multilingual Support
We don't just support English. We have native, fine-tuned recognition for:
- 🇺🇸 English (US/UK)
- 🇩🇪 German (DACH)
- 🇫🇷 French
- 🇪🇸 Spanish
- 🇮🇹 Italian
2. 🎭 Granular Control
Decide exactly how much context you want to reveal.
MASK_ALL:{PERSON}(Maximum Privacy)PARTIAL_MASK:{User_Hash_Prename_John}(Maximum Utility)KEEP_VISIBLE:Berlin(Keep Cities visible for context)
3. 🔄 100% Reversible & Consistent
Every masking operation generates a secure, ephemeral mapping. You can perfectly reconstruct the LLM's response.
- Input: "Hello
{User_a1b2_Prename_John}..." - Output: "Hello John..."
By using a Seed, you ensure that "John" is always masked to the same ID across different sessions or chat messages.
4. 🆔 Specialized Recognizers
We go beyond standard NER. We detect:
- German IBANs (even with spaces)
- German IDs (Personalausweis)
- Complex Addresses (Street vs. City separation)
🚀 Installation
pip install privalyse-mask
Note: You will need to download the Spacy models for your desired languages (e.g., python -m spacy download en_core_web_lg).
🛠️ Advanced Configuration
from privalyse_mask import PrivalyseMasker, MaskingConfig, MaskingLevel
# Configure masking granularity
config = MaskingConfig(
default_level=MaskingLevel.PARTIAL_MASK, # Default: {User_Hash_Prename_John}
entity_overrides={
"LOCATION": MaskingLevel.KEEP_VISIBLE, # Keep cities like "Paris" visible
"PHONE_NUMBER": MaskingLevel.MASK_ALL, # Just {PHONE_NUMBER}
"EMAIL_ADDRESS": MaskingLevel.MASK_WITH_CONTEXT # {Email_at_gmail.com}
}
)
masker = PrivalyseMasker(config=config)
📂 Handling JSON & Chat History
You can mask entire JSON objects (e.g., chat history) recursively.
chat_history = [
{"role": "user", "content": "My name is John."},
{"role": "assistant", "content": "Hello John!"}
]
# mask_struct handles Dicts and Lists recursively
masked_history, mapping = masker.mask_struct(chat_history)
🚀 The Vision: The Privacy Hub for AI
We are building the central nervous system for secure AI development.
-
Privalyse CLI: The Eyes (Visibility & Detection).
- Illuminates the black box.
- Scans your codebase and runtime for vulnerabilities.
- Detects leaks before they happen.
-
Privalyse Mask: The Shield (Proactive Protection).
- Safeguards data in real-time.
- Ensures compliance by design.
- Preserves utility through semantic masking.
Don't just find leaks. Prevent them.
🌐 The Privalyse Ecosystem
We are creating a unified ecosystem where privacy is a catalyst for better AI.
- Privalyse.com: Our Vision & Platform.
- Privalyse CLI: Scan your codebase.
- Privalyse Mask: Protect your pipeline.
🤝 Contributing
We welcome contributions! Please see our Contributing Guidelines for details.
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file privalyse_mask-0.1.0.tar.gz.
File metadata
- Download URL: privalyse_mask-0.1.0.tar.gz
- Upload date:
- Size: 32.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f5ecd1c37bd5f75bc51ce2ed69bdd27934f253cc7d640ddb00f7fe2dec82dc2a
|
|
| MD5 |
7c839c9919981166ca3aae6728a0c973
|
|
| BLAKE2b-256 |
64998d77cc0470ba25ea402a82b93528571c9cb12c8c95d50ca5218e6f7333ec
|
Provenance
The following attestation bundles were made for privalyse_mask-0.1.0.tar.gz:
Publisher:
release.yml on Privalyse/privalyse-mask
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
privalyse_mask-0.1.0.tar.gz -
Subject digest:
f5ecd1c37bd5f75bc51ce2ed69bdd27934f253cc7d640ddb00f7fe2dec82dc2a - Sigstore transparency entry: 778952398
- Sigstore integration time:
-
Permalink:
Privalyse/privalyse-mask@7d7fe8080f9a3beb117a2d7890b69f89ee0b6d86 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/Privalyse
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@7d7fe8080f9a3beb117a2d7890b69f89ee0b6d86 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file privalyse_mask-0.1.0-py3-none-any.whl.
File metadata
- Download URL: privalyse_mask-0.1.0-py3-none-any.whl
- Upload date:
- Size: 11.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fc54e45173dad5074c57180ee3e162085e49de1f64cfe2c6e5a17847fe78c8e7
|
|
| MD5 |
a66ec34341b1baa7c8ce652813e574e3
|
|
| BLAKE2b-256 |
07afb393eb11461bbc0c4f24440842752a0615e6d6365eee1e9c6e96ab6391d8
|
Provenance
The following attestation bundles were made for privalyse_mask-0.1.0-py3-none-any.whl:
Publisher:
release.yml on Privalyse/privalyse-mask
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
privalyse_mask-0.1.0-py3-none-any.whl -
Subject digest:
fc54e45173dad5074c57180ee3e162085e49de1f64cfe2c6e5a17847fe78c8e7 - Sigstore transparency entry: 778952403
- Sigstore integration time:
-
Permalink:
Privalyse/privalyse-mask@7d7fe8080f9a3beb117a2d7890b69f89ee0b6d86 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/Privalyse
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@7d7fe8080f9a3beb117a2d7890b69f89ee0b6d86 -
Trigger Event:
workflow_dispatch
-
Statement type: