Skip to main content

Transparent PII masking for LLM clients — keep sensitive data out of your AI prompts

Project description

PrivacyLens

Transparent PII masking for LLM clients — keep sensitive data out of your AI prompts.

CI PyPI License: MIT


Why?

Every prompt you send to an LLM can leak PII — names, emails, phone numbers, SSNs. PrivacyLens intercepts your prompts, replaces PII with anonymous tokens, and restores the original values when the response comes back. Your LLM never sees real data.

Input:  "Email john@example.com about the project"
Sent:   "Email [EMAIL_1] about the project"         ← LLM sees this
Output: "I've emailed john@example.com"              ← Your app sees this

Install

pip install privacylens

Usage

Step 1: Wrap your client

from privacylens import shield

# Pick your LLM client — wrap it with shield()
import openai
client = shield(openai.OpenAI())

Step 2: Use it normally

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{
        "role": "user",
        "content": "My name is John Doe and my email is john@example.com. Write me a welcome email."
    }],
)

print(response.choices[0].message.content)
# Output contains "John Doe" and "john@example.com" — restored automatically

That's it. No other code changes needed. The PII is masked before it reaches the LLM and unmasked in the response.


Works With Every Major LLM Client

from privacylens import shield

# OpenAI
client = shield(openai.OpenAI())
client = shield(openai.AsyncOpenAI())

# Anthropic
client = shield(anthropic.Anthropic())
client = shield(anthropic.AsyncAnthropic())

# LangChain — returns a callback handler
handler = shield(my_langchain_chat_model)

# CrewAI
adapter = shield(my_crewai_agent)

# Strands
wrapper = shield(my_strands_model)

Each wrapped client behaves exactly like the original. Same methods, same parameters, same return types.


What Gets Detected

Built-in (regex, zero dependencies)

Entity Example Input What LLM Sees
Email john@example.com [EMAIL_1]
Phone 555-123-4567 [PHONE_1]
SSN 123-45-6789 [SSN_1]

Optional: Presidio (50+ entity types)

pip install privacylens[pii]

Detects names, addresses, credit card numbers, dates of birth, passport numbers, and more using Microsoft Presidio.

Optional: GLiNER (ML-based semantic detection)

pip install privacylens[semantic]

Uses a neural model to detect entities that regex can't catch.


Custom Detectors

Add your own patterns via privacylens.yaml in your project root:

detectors:
  regex:
    patterns:
      - entity_type: EMAIL
        pattern: '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}'
      - entity_type: EMPLOYEE_ID
        pattern: 'EMP-\d{5,}'
      - entity_type: PROJECT_CODE
        pattern: 'PROJ-[A-Z]{2,4}-\d{3,}'

Vault Backends

Tokens are stored in a vault so they can be restored later. Three backends available:

# In-memory (default) — fast, lost on restart
vault: memory

# SQLite — persists to disk
vault: sqlite

# Redis — shared across processes
vault: redis

For Redis:

pip install privacylens[redis]

Inspect Without Masking

See what would be detected without actually masking anything:

from privacylens import inspect

entities = inspect("Contact john@example.com or call 555-123-4567")

for entity in entities:
    print(f"{entity.entity_type}: '{entity.value}' at [{entity.start}:{entity.end}]")

# EMAIL: 'john@example.com' at [8:24]
# PHONE: '555-123-4567' at [33:45]

Low-Level API

For full control over the pipeline:

from privacylens.core.pipeline import Pipeline
from privacylens.core.config import load_config

config = load_config()
pipeline = Pipeline(config)

# Tokenize
messages = [{"role": "user", "content": "Email john@example.com"}]
tokenized = pipeline.tokenize_messages(messages, session_id="s1")

# Send to LLM (tokenized messages have PII replaced)
llm_response = call_your_llm(tokenized)

# Detokenize
restored = pipeline.detokenize(llm_response, session_id="s1")

Links

License

MIT © 2026 Madan Gopal

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

privacylens-0.1.2.tar.gz (44.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

privacylens-0.1.2-py3-none-any.whl (28.9 kB view details)

Uploaded Python 3

File details

Details for the file privacylens-0.1.2.tar.gz.

File metadata

  • Download URL: privacylens-0.1.2.tar.gz
  • Upload date:
  • Size: 44.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for privacylens-0.1.2.tar.gz
Algorithm Hash digest
SHA256 3ec04bfa4336cd2f0ee0a9ef19395a6b2a896a031212f9cf67582f6620f1ea2c
MD5 d254a828c5ec49e239498c7fffb7616e
BLAKE2b-256 d107b49056b36935efc0447f6fa5e680892f299c7b4aea43ca7a2d66fedc90b6

See more details on using hashes here.

File details

Details for the file privacylens-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: privacylens-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 28.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for privacylens-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 83ddea812574cc5d9a50866413b90e2aecbb3929398400cc81c9beef92ff6813
MD5 c7ed0b516b8832643add306a23ac18a8
BLAKE2b-256 993e7b552b96f43f0a5868b6c8b4b150f93c8b6b913d32dcca98157816e938d3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page