Shields your confidential data from third party LLM providers
Project description
llmshield
Enterprise-grade PII protection for Large Language Model interactions
Table of Contents
- Overview
- Architecture
- Installation
- Quick Start
- Advanced Configuration
- Supported LLM Providers
- Entity Detection
- Best Practices
- Language Support
- Requirements
- Development
- Contributing
- License
- Production Usage
Overview
llmshield is a production-ready, zero-dependency Python library engineered for enterprise-grade protection of sensitive information in Large Language Model (LLM) interactions. It provides automatic detection and cloaking of personally identifiable information (PII) and sensitive entities, ensuring data privacy compliance while maintaining seamless LLM integration.
Built for accuracy and performance, llmshield employs a sophisticated multi-layered approach combining dictionary-based matching, pattern recognition, contextual analysis, and rule-based classification to achieve maximum entity detection precision.
-
🔒 Comprehensive Entity Detection: Advanced multi-layered detection system
- Proper Nouns: Persons, Places, Organizations, Concepts
- Contact Information: Email addresses, URLs, social handles
- Numeric Data: Phone numbers, Credit card numbers, IDs
- Custom Entities: Extensible detection framework
-
🚀 Enterprise Performance: Production-optimized with minimal latency overhead
-
🔌 Zero Dependencies: Pure Python implementation with no external requirements
-
🛡️ End-to-End Protection: Bidirectional security for prompts and responses
-
🎯 Universal LLM Support: Provider-agnostic with specialized optimizations
-
📊 Conversation Memory: Intelligent caching for multi-turn dialogues
Architecture
llmshield employs a sophisticated Provider System that automatically detects and optimizes for different LLM APIs, ensuring seamless integration across platforms while maintaining enterprise-grade security and performance standards.
Provider System
The library features an extensible provider architecture that handles LLM-specific parameter formatting and API quirks:
graph TD
A["LLMShield.ask()"] --> B["Provider Detection"]
B --> C{LLM Function Analysis}
C -->|"openai.chat.completions.create"| D["OpenAIProvider"]
C -->|"openai.beta.chat.completions.parse"| D
C -->|"Custom/Unknown APIs"| E["DefaultProvider"]
D --> F["OpenAI Optimizations"]
E --> G["Generic Handling"]
F --> H["Parameter Conversion"]
F --> I["Streaming Management"]
F --> J["Beta API Support"]
G --> K["Function Introspection"]
G --> L["Best-effort Formatting"]
H --> M["LLM Function Call"]
I --> M
J --> M
K --> M
L --> M
M --> N["Response Processing"]
N --> O["Entity Uncloaking"]
O --> P["Final Response"]
Provider Types
OpenAI Provider (OpenAIProvider)
- Automatic detection of OpenAI API functions
- Intelligent parameter conversion (
prompt→messagesformat) - Streaming support with proper chunk handling
- Beta API compatibility (structured outputs, function calling)
- Automatic stream parameter management for APIs that don't support it
Default Provider (DefaultProvider)
- Universal fallback for unknown LLM functions
- Function signature introspection
- Best-effort parameter mapping
- Compatible with most LLM libraries (Anthropic, Cohere, Hugging Face, etc.)
Automatic Detection
The provider system automatically detects your LLM function based on:
- Function name patterns (
chat.completions.create,beta.chat.completions.parse) - Module paths (
openai.*) - Function signatures and parameter inspection
Installation
Production Installation
Install llmshield from PyPI using pip:
pip install llmshield
Development Installation
For contributors and advanced users:
# Clone the repository
git clone https://github.com/yourusername/llmshield.git
cd llmshield
# Install in development mode
pip install -e .
Verification
Verify your installation:
import llmshield
print(llmshield.__version__)
Quick Start
Basic Usage
from llmshield import LLMShield
# Initialize with automatic provider detection
shield = LLMShield()
# Manual cloaking/uncloaking
cloaked_prompt, entity_map = shield.cloak(
"Hi, I'm John Doe from Acme Corp (john.doe@acmecorp.com)"
)
print(cloaked_prompt)
# Output: "Hi, I'm <PERSON_0> from <ORG_1> (<EMAIL_2>)"
# Send to your LLM
llm_response = your_llm_function(cloaked_prompt)
# Restore original entities
original_response = shield.uncloak(llm_response, entity_map)
Direct LLM Integration
from openai import OpenAI
from llmshield import LLMShield
# OpenAI example with automatic provider detection
client = OpenAI(api_key="your-api-key")
shield = LLMShield(llm_func=client.chat.completions.create)
# Single-turn conversation
response = shield.ask(
model="gpt-4",
prompt="Hi, I'm Sarah Johnson (sarah.j@techcorp.com), help me write an email."
)
# Multi-turn conversation with automatic entity consistency
messages = [
{"role": "user", "content": "I'm John Smith from DataCorp"},
{"role": "assistant", "content": "Hello John! How can I help you today?"},
{"role": "user", "content": "Can you email me at john@datacorp.com?"}
]
response = shield.ask(model="gpt-4", messages=messages)
Streaming Support
# Streaming with real-time entity protection
response_stream = shield.ask(
model="gpt-4",
prompt="Generate a report about Jane Doe (jane@example.com)",
stream=True
)
for chunk in response_stream:
print(chunk, end="", flush=True)
Advanced Configuration
Custom Delimiters
# Configure entity placeholder format
shield = LLMShield(
start_delimiter='[[', # Default: '<'
end_delimiter=']]' # Default: '>'
)
# Entities will appear as [[PERSON_0]], [[EMAIL_1]], etc.
Conversation Caching
# Configure cache size for multi-turn conversations
shield = LLMShield(
llm_func=your_llm_function,
max_cache_size=256 # Default: 128
)
Supported LLM Providers
llmshield works with any LLM library or API, with specialized optimizations for:
- OpenAI (GPT-4, GPT-3.5, all chat models)
- OpenAI Beta APIs (structured outputs, function calling)
Compatible (via DefaultProvider)
- Anthropic Claude (via
anthropiclibrary) - Google Gemini (via
google-generativeai) - Cohere (via
coherelibrary) - Hugging Face Transformers
- Azure OpenAI
- AWS Bedrock
- Any custom LLM function
Entity Detection
llmshield uses a multi-layered approach combining:
- Dictionary-based matching for known entities
- Pattern recognition for structured data (emails, phones, etc.)
- Contextual analysis for proper nouns
- Rule-based classification for entity types
Supported Entity Types
| Category | Examples | Placeholder Format |
|---|---|---|
| Persons | John Doe, Dr. Smith | <PERSON_0> |
| Organizations | Acme Corp, NHS | <ORG_0> |
| Places | London, Main Street | <PLACE_0> |
| Emails | user@domain.com | <EMAIL_0> |
| Phone Numbers | +1-555-0123 | <PHONE_0> |
| URLs | https://example.com | <URL_0> |
| Credit Cards | 4111-1111-1111-1111 | <CREDIT_CARD_0> |
Best Practices
🔒 Security Guidelines
- Pre-transmission validation: Always verify sensitive data is properly cloaked before LLM transmission
- Entity map security: Store entity mappings securely using encryption for persistent storage
- Delimiter selection: Choose delimiters that don't conflict with your data format or LLM training
- Input sanitization: Validate and sanitize all inputs before processing
- Regular audits: Periodically review cloaked outputs to ensure no PII leakage
⚡ Performance Optimization
- Instance reuse: Maintain single
LLMShieldinstances to leverage conversation caching - Cache monitoring: Track cache hit rates for multi-turn conversations (aim for >80%)
- Delimiter alignment: Select delimiters compatible with your LLM's tokenization
- Batch processing: Process multiple prompts in batches when possible
- Memory management: Configure appropriate cache sizes based on usage patterns
🔧 Integration Best Practices
- Exception handling: Implement comprehensive error handling for
ValueErrorexceptions - Provider testing: Validate functionality with your specific LLM provider before production
- Structured outputs: Leverage structured output capabilities for complex response processing
- Monitoring: Implement logging and monitoring for entity detection accuracy
- Testing: Include PII protection tests in your CI/CD pipeline
Language Support
- Primary: English (optimized)
- Secondary: Spanish (good accuracy)
- Experimental: Other languages (reduced accuracy, potential PII leakage)
Requirements
- Python: 3.10+
- Dependencies: None (zero-dependency architecture)
- Memory: Minimal footprint with efficient caching
- Performance: Sub-millisecond entity detection for typical prompts
Development
Setup
# Clone and setup development environment
git clone https://github.com/yourusername/llmshield.git
cd llmshield
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install development dependencies
make dev-dependencies
Testing
# Run full test suite
make tests
# Run with coverage
make coverage
# Test specific providers (requires API keys)
OPENAI_API_KEY=your-key python -m unittest tests/providers/test_openai.py
Code Quality
# Format code
black llmshield/ tests/
isort llmshield/ tests/
# Lint
flake8 llmshield/ tests/
Building and Distribution
Local Installation
# Install package locally in development mode
pip install -e .
# Or install from local build
python -m build
pip install dist/llmshield-*.whl
Building for Distribution
# Install build dependencies
pip install build twine
# Build the package
python -m build
# This creates:
# - dist/llmshield-*.tar.gz (source distribution)
# - dist/llmshield-*.whl (wheel distribution)
Publishing to PyPI
# Check the built package
twine check dist/*
# Upload to test PyPI first (recommended)
twine upload --repository testpypi dist/*
# Upload to production PyPI
twine upload dist/*
Prerequisites for publishing:
- PyPI account with API key configured
- Maintainer permissions on the llmshield project
- All tests passing and version bumped in
pyproject.toml
Contributing
We welcome contributions! Please see our Contributing Guidelines for details.
Development Principles
- Zero Dependencies: Maintain pure Python implementation
- Performance First: Optimize for production workloads
- Security Focus: Prioritize data protection and privacy
- Universal Compatibility: Support all major LLM providers
License
GNU Affero General Public License v3.0 - See LICENSE.txt for details.
Production Usage
llmshield is trusted by:
- brainful.ai - AI-powered enterprise solutions
Get Started
Ready to secure your LLM interactions? Install llmshield today:
pip install llmshield
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llmshield-0.0.7.tar.gz.
File metadata
- Download URL: llmshield-0.0.7.tar.gz
- Upload date:
- Size: 415.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ef8df5d1aea6048dfd4fd980d5d42c25c6aa89266da6cfc7843f1abb88dfe733
|
|
| MD5 |
37d3f5365e9a1a749be88d2453b585d9
|
|
| BLAKE2b-256 |
c978a5fdefb01a913f4d6e2ad3352d91c846238dee6af8c70608160d584f9f88
|
File details
Details for the file llmshield-0.0.7-py3-none-any.whl.
File metadata
- Download URL: llmshield-0.0.7-py3-none-any.whl
- Upload date:
- Size: 390.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4ecd16dda730a1ea0faf1a34f257e44fe0530954378caf30c6eb4663232b00f9
|
|
| MD5 |
32bbd918c5ffc41a0541d76bf77a95a2
|
|
| BLAKE2b-256 |
acf8d46bcf2cf5cb97ec33fa9c5226bf754073032fe7ebc9dfe97b3b36dc0bf7
|