Skip to main content

Transparent PII masking for LLM clients — keep sensitive data out of your AI prompts

Project description

PrivacyLens

Transparent PII masking for LLM clients — keep sensitive data out of your AI prompts.

CI PyPI License: MIT


Why?

Every prompt you send to an LLM can leak PII — names, emails, phone numbers, SSNs. PrivacyLens intercepts your prompts, replaces PII with anonymous tokens, and restores the original values when the response comes back. Your LLM never sees real data.

Input:  "Email john@example.com about the project"
Sent:   "Email [EMAIL_1] about the project"         ← LLM sees this
Output: "I've emailed john@example.com"              ← Your app sees this

Install

pip install privacylens

Usage

Step 1: Wrap your client

from privacylens import shield

# Pick your LLM client — wrap it with shield()
import openai
client = shield(openai.OpenAI())

Step 2: Use it normally

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{
        "role": "user",
        "content": "My name is John Doe and my email is john@example.com. Write me a welcome email."
    }],
)

print(response.choices[0].message.content)
# Output contains "John Doe" and "john@example.com" — restored automatically

That's it. No other code changes needed. The PII is masked before it reaches the LLM and unmasked in the response.


Works With Every Major LLM Client

from privacylens import shield

# OpenAI
client = shield(openai.OpenAI())
client = shield(openai.AsyncOpenAI())

# Anthropic
client = shield(anthropic.Anthropic())
client = shield(anthropic.AsyncAnthropic())

# LangChain — returns a callback handler
handler = shield(my_langchain_chat_model)

# CrewAI
adapter = shield(my_crewai_agent)

# Strands
wrapper = shield(my_strands_model)

Each wrapped client behaves exactly like the original. Same methods, same parameters, same return types.


What Gets Detected

Built-in (regex, zero dependencies)

Entity Example Input What LLM Sees
Email john@example.com [EMAIL_1]
Phone 555-123-4567 [PHONE_1]
SSN 123-45-6789 [SSN_1]

Optional: Presidio (50+ entity types)

pip install privacylens[pii]

Detects names, addresses, credit card numbers, dates of birth, passport numbers, and more using Microsoft Presidio.

Optional: GLiNER (ML-based semantic detection)

pip install privacylens[semantic]

Uses a neural model to detect entities that regex can't catch.


Custom Detectors

Add your own patterns via privacylens.yaml in your project root:

detectors:
  - type: regex
    name: email
    pattern: '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}'
  - type: regex
    name: employee_id
    pattern: 'EMP-\d{5,}'
  - type: regex
    name: project_code
    pattern: 'PROJ-[A-Z]{2,4}-\d{3,}'

Vault Backends

Tokens are stored in a vault so they can be restored later. Three backends available:

# In-memory (default) — fast, lost on restart
vault: memory

# SQLite — persists to disk
vault: sqlite

# Redis — shared across processes
vault: redis

For Redis:

pip install privacylens[redis]

Inspect Without Masking

See what would be detected without actually masking anything:

from privacylens import inspect

entities = inspect("Contact john@example.com or call 555-123-4567")

for entity in entities:
    print(f"{entity.entity_type}: '{entity.value}' at [{entity.start}:{entity.end}]")

# EMAIL: 'john@example.com' at [8:24]
# PHONE: '555-123-4567' at [33:45]

Low-Level API

For full control over the pipeline:

from privacylens.core.pipeline import Pipeline
from privacylens.core.config import load_config

config = load_config()
pipeline = Pipeline(config)

# Tokenize
messages = [{"role": "user", "content": "Email john@example.com"}]
tokenized = pipeline.tokenize_messages(messages, session_id="s1")

# Send to LLM (tokenized messages have PII replaced)
llm_response = call_your_llm(tokenized)

# Detokenize
restored = pipeline.detokenize(llm_response, session_id="s1")

Links

License

MIT © 2026 Madan Gopal

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

privacylens-0.1.1.tar.gz (44.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

privacylens-0.1.1-py3-none-any.whl (28.6 kB view details)

Uploaded Python 3

File details

Details for the file privacylens-0.1.1.tar.gz.

File metadata

  • Download URL: privacylens-0.1.1.tar.gz
  • Upload date:
  • Size: 44.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for privacylens-0.1.1.tar.gz
Algorithm Hash digest
SHA256 0a73baf25d0bff88e03d238cfca842e84f5b32452638117469066c4bee64e22d
MD5 c567b89d86f59d4ded4135f6ffc9816f
BLAKE2b-256 79b8dc9dd6bdbc9e9bef8f18087110a373a1860b93ae4158d6b7f75339df09a9

See more details on using hashes here.

File details

Details for the file privacylens-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: privacylens-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 28.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for privacylens-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e4942d42d0d502779c2e843c20ac6fc7633ee65ade961f0071f7e51f2313a102
MD5 fb54ac14eb6048c88839c4a9cf2503cd
BLAKE2b-256 c9535417b8c6542fb3a2dee4f1f77d2b6bdc2782bf921d74f45d8c8f220c7da5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page