Skip to main content

Advanced PII pseudonymization for LLM context preservation.

Project description

🛡️ Privalyse Mask

Redefining Privacy in AI-Applications.

The Privacy-Protection Layer for your LLM Pipeline.

PyPI version License: MIT Python 3.8+


💥 Stop choosing between Privacy and High-Quality Answers.

Privalyse Mask is the missing link that makes LLMs GDPR-compliant without making them stupid.

Most tools destroy data to save it. We don't. We transform sensitive PII into Semantic Surrogates—tokens that preserve gender, culture, geography, and structure—so your AI still "gets it" while the data stays safe.

Zero Leaks. Full Context. 100% Reversible.

Star this repository if you believe in Privacy-First AI!


🧠 The Dilemma: Utility vs. Privacy

When sending data to an LLM, you usually have two bad options:

  1. Send Everything: You risk GDPR fines and data leaks.
  2. Redact Everything: The LLM becomes stupid. "John from Berlin" becomes [PERSON] from [LOCATION]. The model loses gender, culture, and geography.

💡 The Solution: Semantic Masking

Privalyse Mask solves this by replacing sensitive entities with Context-Aware, Reversible Surrogates. We preserve the meaning while hiding the identity.

Original Input Standard Redaction Privalyse Mask
"John Smith lives at 123 Main St, New York." [PERSON] lives at [ADDRESS]. "{User_61173_Prename_John} lives at {Address_in_New York_Street_cb7e6}."
"Max Mustermann wohnt in Berlin." [PERSON] wohnt in [LOCATION]. "{User_44aa4_Prename_Max} wohnt in {Address_in_Berlin}."
"Call me at +49 30 123456." Call me at [PHONE]. Call me at {Phone_DE}.

The Model Understands: "This is a male person named John living in NYC." ❌ The Model Doesn't Know: Who exactly it is or where exactly they live.


⚡ Usage in 3 Lines

from privalyse_mask import PrivalyseMasker

# Automatically loads EN, DE, FR, ES, IT models
masker = PrivalyseMasker() 

masked_text, mapping = masker.mask("John lives in Berlin.")
# Result: "{User_a1b2_Prename_John} lives in {Address_in_Berlin}."

✨ Why Privalyse Mask?

1. 🌍 True Multilingual Support

We don't just support English. We have native, fine-tuned recognition for:

  • 🇺🇸 English (US/UK)
  • 🇩🇪 German (DACH)
  • 🇫🇷 French
  • 🇪🇸 Spanish
  • 🇮🇹 Italian

2. 🎭 Granular Control

Decide exactly how much context you want to reveal.

  • MASK_ALL: {PERSON} (Maximum Privacy)
  • PARTIAL_MASK: {User_Hash_Prename_John} (Maximum Utility)
  • KEEP_VISIBLE: Berlin (Keep Cities visible for context)

3. 🔄 100% Reversible & Consistent

Every masking operation generates a secure, ephemeral mapping. You can perfectly reconstruct the LLM's response.

  • Input: "Hello {User_a1b2_Prename_John}..."
  • Output: "Hello John..."

By using a Seed, you ensure that "John" is always masked to the same ID across different sessions or chat messages.

4. 🆔 Specialized Recognizers

We go beyond standard NER. We detect:

  • German IBANs (even with spaces)
  • German IDs (Personalausweis)
  • Complex Addresses (Street vs. City separation)

🚀 Installation

pip install privalyse-mask

Note: You will need to download the Spacy models for your desired languages (e.g., python -m spacy download en_core_web_lg).


🛠️ Advanced Configuration

from privalyse_mask import PrivalyseMasker, MaskingConfig, MaskingLevel

# Configure masking granularity
config = MaskingConfig(
    default_level=MaskingLevel.PARTIAL_MASK, # Default: {User_Hash_Prename_John}
    entity_overrides={
        "LOCATION": MaskingLevel.KEEP_VISIBLE,   # Keep cities like "Paris" visible
        "PHONE_NUMBER": MaskingLevel.MASK_ALL,   # Just {PHONE_NUMBER}
        "EMAIL_ADDRESS": MaskingLevel.MASK_WITH_CONTEXT # {Email_at_gmail.com}
    }
)

masker = PrivalyseMasker(config=config)

📂 Handling JSON & Chat History

You can mask entire JSON objects (e.g., chat history) recursively.

chat_history = [
    {"role": "user", "content": "My name is John."},
    {"role": "assistant", "content": "Hello John!"}
]

# mask_struct handles Dicts and Lists recursively
masked_history, mapping = masker.mask_struct(chat_history)

🚀 The Vision: The Privacy Hub for AI

We are building the central nervous system for secure AI development.

  • Privalyse CLI: The Eyes (Visibility & Detection).

    • Illuminates the black box.
    • Scans your codebase and runtime for vulnerabilities.
    • Detects leaks before they happen.
  • Privalyse Mask: The Shield (Proactive Protection).

    • Safeguards data in real-time.
    • Ensures compliance by design.
    • Preserves utility through semantic masking.

Don't just find leaks. Prevent them.


🌐 The Privalyse Ecosystem

We are creating a unified ecosystem where privacy is a catalyst for better AI.


🤝 Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

privalyse_mask-0.1.0.tar.gz (32.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

privalyse_mask-0.1.0-py3-none-any.whl (11.8 kB view details)

Uploaded Python 3

File details

Details for the file privalyse_mask-0.1.0.tar.gz.

File metadata

  • Download URL: privalyse_mask-0.1.0.tar.gz
  • Upload date:
  • Size: 32.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for privalyse_mask-0.1.0.tar.gz
Algorithm Hash digest
SHA256 f5ecd1c37bd5f75bc51ce2ed69bdd27934f253cc7d640ddb00f7fe2dec82dc2a
MD5 7c839c9919981166ca3aae6728a0c973
BLAKE2b-256 64998d77cc0470ba25ea402a82b93528571c9cb12c8c95d50ca5218e6f7333ec

See more details on using hashes here.

Provenance

The following attestation bundles were made for privalyse_mask-0.1.0.tar.gz:

Publisher: release.yml on Privalyse/privalyse-mask

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file privalyse_mask-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: privalyse_mask-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 11.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for privalyse_mask-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fc54e45173dad5074c57180ee3e162085e49de1f64cfe2c6e5a17847fe78c8e7
MD5 a66ec34341b1baa7c8ce652813e574e3
BLAKE2b-256 07afb393eb11461bbc0c4f24440842752a0615e6d6365eee1e9c6e96ab6391d8

See more details on using hashes here.

Provenance

The following attestation bundles were made for privalyse_mask-0.1.0-py3-none-any.whl:

Publisher: release.yml on Privalyse/privalyse-mask

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page