Shields your confidential data from third party LLM providers
Project description
LLMShield
A zero-dependency Python library for protecting personally identifiable information (PII) in Large Language Model interactions through automatic entity detection and cloaking.
Overview
LLMShield provides enterprise-grade protection for sensitive information in LLM interactions by automatically detecting and replacing PII with placeholders before transmission, then restoring original values in responses. The library employs a multi-layered detection approach combining pattern recognition, dictionary matching, and contextual analysis.
Key Features:
- Zero external dependencies
- Automatic entity detection and cloaking
- Multi-turn conversation support with entity consistency
- Streaming response handling
- Universal LLM provider compatibility
- Production-optimized performance
High-Level Data Flow
graph LR
A["Raw Input<br/>'Contact Dr. Smith at smith@hospital.org'"] --> B["Entity Detection<br/>PERSON: Dr. Smith<br/>EMAIL: smith@hospital.org"]
B --> C["PII Anonymization<br/>'Contact <PERSON_0> at <EMAIL_1>'"]
C --> D["LLM Processing<br/>Safe text sent to<br/>OpenAI, Claude, etc."]
D --> E["Response Restoration<br/>Placeholders → Original PII"]
E --> F["Protected Output<br/>'I'll help you contact Dr. Smith<br/>at smith@hospital.org'"]
%% Styling
classDef flowStyle fill:#f8f9fa,stroke:#495057,stroke-width:2px,color:#212529
classDef detectionStyle fill:#e8f4f8,stroke:#0c63e4,stroke-width:2px,color:#212529
classDef anonymizationStyle fill:#fff3cd,stroke:#856404,stroke-width:2px,color:#212529
classDef llmStyle fill:#f0e6ff,stroke:#6f42c1,stroke-width:2px,color:#212529
classDef restorationStyle fill:#d1ecf1,stroke:#0c5460,stroke-width:2px,color:#212529
class A flowStyle
class B detectionStyle
class C anonymizationStyle
class D llmStyle
class E restorationStyle
class F flowStyle
Under the Hood: System Architecture
graph LR
subgraph Input ["Input Layer"]
A["Textual Input<br/>Contains PII Entities"]
end
subgraph Detection ["Entity Detection Engine"]
B["Waterfall Detection Algorithm<br/>• Phase 1: Pattern Recognition (RegEx)<br/>• Phase 2: Numerical Validation (Luhn)<br/>• Phase 3: Linguistic Analysis (NLP)<br/>Entity Types: EMAIL, PERSON, PHONE, URL"]
end
subgraph Cloaking ["Entity Anonymization"]
C["Classification & Tokenization<br/>PII → Typed Placeholders<br/>Deterministic Mapping<br/>Format: <TYPE_INDEX>"]
end
subgraph Provider ["LLM Provider Interface"]
D["Provider-Agnostic API Gateway<br/>Supported: OpenAI, Anthropic Claude,<br/>Google Gemini, Azure OpenAI,<br/>AWS Bedrock, Custom Endpoints"]
end
subgraph Restoration ["Entity De-anonymization"]
E["Inverse Token Mapping<br/>Placeholder Detection<br/>Bidirectional Text Reconstruction<br/>Integrity Preservation"]
end
subgraph Output ["Output Layer"]
F["Reconstructed Response<br/>Original PII Restored<br/>Stream-Compatible"]
end
subgraph Memory ["State Management System"]
G["LRU Cache Implementation<br/>Conversation Continuity<br/>Hash-Based Entity Mapping<br/>O(1) Lookup Complexity"]
end
%% Primary data flow
A --> B
B --> C
C --> D
D --> E
E --> F
%% State management interactions
Memory -.->|"Read/Write<br/>Entity Maps"| C
Memory -.->|"Consistency<br/>Validation"| E
%% Styling
classDef inputStyle fill:#f8f9fa,stroke:#495057,stroke-width:2px,color:#212529
classDef detectionStyle fill:#e8f4f8,stroke:#0c63e4,stroke-width:2px,color:#212529
classDef cloakingStyle fill:#fff3cd,stroke:#856404,stroke-width:2px,color:#212529
classDef providerStyle fill:#f0e6ff,stroke:#6f42c1,stroke-width:2px,color:#212529
classDef restorationStyle fill:#d1ecf1,stroke:#0c5460,stroke-width:2px,color:#212529
classDef outputStyle fill:#d4edda,stroke:#155724,stroke-width:2px,color:#212529
classDef memoryStyle fill:#f8d7da,stroke:#721c24,stroke-width:2px,color:#212529
class A inputStyle
class B detectionStyle
class C cloakingStyle
class D providerStyle
class E restorationStyle
class F outputStyle
class G memoryStyle
Installation
pip install llmshield
Quick Start
Basic Usage
from llmshield import LLMShield
# Initialise shield
shield = LLMShield()
# Protect sensitive information
cloaked_prompt, entity_map = shield.cloak(
"Contact John Doe at john.doe@company.com or call +1-555-0123"
)
print(cloaked_prompt)
# Output: "Contact <PERSON_0> at <EMAIL_1> or call <PHONE_NUMBER_2>"
# Process with LLM
llm_response = your_llm_function(cloaked_prompt)
# Restore original entities
restored_response = shield.uncloak(llm_response, entity_map)
Note: Individual cloak() and uncloak() methods support single messages only and do not maintain conversation history. For multi-turn conversations with entity consistency across messages, use the ask() method.
Direct LLM Integration
from openai import OpenAI
from llmshield import LLMShield
client = OpenAI(api_key="your-api-key")
shield = LLMShield(llm_func=client.chat.completions.create)
# Single request with automatic protection
response = shield.ask(
model="gpt-4",
prompt="Draft an email to Sarah Johnson at sarah.j@techcorp.com"
)
# Multi-turn conversation
messages = [
{"role": "user", "content": "I'm John Smith from DataCorp"},
{"role": "assistant", "content": "Hello! How can I help you?"},
{"role": "user", "content": "Email me at john@datacorp.com"}
]
response = shield.ask(model="gpt-4", messages=messages)
Streaming Support
response_stream = shield.ask(
model="gpt-4",
prompt="Generate a report about Jane Doe (jane@example.com)",
stream=True
)
for chunk in response_stream:
print(chunk, end="", flush=True)
Entity Detection
The library detects and protects the following entity types:
| Entity Type | Examples | Placeholder Format |
|---|---|---|
| Person | John Doe, Dr. Smith | <PERSON_0> |
| Organization | Acme Corp, NHS | <ORGANISATION_0> |
| Place | London, Main Street | <PLACE_0> |
| user@domain.com | <EMAIL_0> |
|
| Phone | +1-555-0123 | <PHONE_NUMBER_0> |
| URL | https://example.com | <URL_0> |
| Credit Card | 4111-1111-1111-1111 | <CREDIT_CARD_0> |
| IP Address | 192.168.1.1 | <IP_ADDRESS_0> |
Configuration
Custom Delimiters
shield = LLMShield(
start_delimiter='[[',
end_delimiter=']]'
)
# Entities appear as [[PERSON_0]], [[EMAIL_1]], etc.
Conversation Caching
LLMShield implements an LRU (Least Recently Used) cache to maintain entity consistency across multi-turn conversations. The cache stores entity mappings for conversation histories, ensuring that all entities (persons, organizations, emails, phones, etc.) mentioned in different messages receive the same placeholders.
shield = LLMShield(
llm_func=your_llm_function,
max_cache_size=1000 # Default: 1000
)
Cache Sizing Guidelines:
- Small applications (< 100 daily conversations):
max_cache_size=100-500 - Medium applications (100-1000 daily conversations):
max_cache_size=1000-5000 - Large applications (> 1000 daily conversations):
max_cache_size=5000-10000
Memory considerations: Each cache entry stores approximately 1-2KB. A cache size of 1000 uses roughly 1-2MB of memory.
Performance impact: Cache hit rates above 80% significantly improve performance for multi-turn conversations by avoiding re-detection of previously seen entities.
Provider Compatibility
LLMShield includes specialized support for OpenAI APIs and universal compatibility with other providers:
Optimized Support:
- OpenAI (GPT-4, GPT-3.5, all chat models)
- OpenAI Beta APIs (structured outputs)
Universal Compatibility:
- Anthropic Claude
- Google Gemini
- Cohere
- Hugging Face Transformers
- Azure OpenAI
- AWS Bedrock
- Custom LLM functions
Language Support
- English: Full optimization and accuracy
- Other languages: Experimental support with reduced accuracy
Requirements
- Python 3.12+
- Zero external dependencies
Development
Setup
git clone https://github.com/yourusername/llmshield.git
cd llmshield
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -e ".[dev]"
Testing
# Run tests
make tests
# Coverage analysis
make coverage
# Code quality checks
make ruff
# Documentation coverage
make doc-coverage
Building and Publishing
Building the Package
# Install build dependencies
make dev-dependencies
# Build the package
make build
Publishing to PyPI
- Update version in
pyproject.toml - Run quality checks:
make tests make coverage make ruff
- Build and publish:
make build twine upload dist/*
Security Considerations
- Validate cloaked outputs before LLM transmission
- Securely store entity mappings for persistent sessions
- Choose delimiters that don't conflict with your data format
- Implement comprehensive input validation
- Regularly audit entity detection accuracy
Contributing
See CONTRIBUTING.md for development guidelines and contribution process.
License
GNU Affero General Public License v3.0 - See LICENSE.txt for details.
Production Usage
LLMShield is used in production at brainful.ai.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llmshield-0.0.8.tar.gz.
File metadata
- Download URL: llmshield-0.0.8.tar.gz
- Upload date:
- Size: 412.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
066f0af12ddafffdb7dc0b2631d7b6886662b7a6dc67cdb839727a421a470658
|
|
| MD5 |
ff765ffe039419580800ed8d4949902d
|
|
| BLAKE2b-256 |
9c9c8505a4591691520d0c5786b5010a896d8fbed14d971ef347179d354d993c
|
File details
Details for the file llmshield-0.0.8-py3-none-any.whl.
File metadata
- Download URL: llmshield-0.0.8-py3-none-any.whl
- Upload date:
- Size: 389.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
16f8eee23e3e58fc4f7673a98d8033666855660fd48ed680dfd9bbe27b463ebd
|
|
| MD5 |
1780d1d77b804cbdc832b1393d583957
|
|
| BLAKE2b-256 |
e3087a91794143be6c5988ea1335913971c21ee42a6024fd14292350e7938e9b
|