Skip to main content

Shields your confidential data from third party LLM providers

Project description

llmshield

Python 3.10+ License: AGPL v3 Zero Dependencies

Enterprise-grade PII protection for Large Language Model interactions

Table of Contents

Overview

llmshield is a production-ready, zero-dependency Python library engineered for enterprise-grade protection of sensitive information in Large Language Model (LLM) interactions. It provides automatic detection and cloaking of personally identifiable information (PII) and sensitive entities, ensuring data privacy compliance while maintaining seamless LLM integration.

Built for accuracy and performance, llmshield employs a sophisticated multi-layered approach combining dictionary-based matching, pattern recognition, contextual analysis, and rule-based classification to achieve maximum entity detection precision.

  • 🔒 Comprehensive Entity Detection: Advanced multi-layered detection system

    • Proper Nouns: Persons, Places, Organizations, Concepts
    • Contact Information: Email addresses, URLs, social handles
    • Numeric Data: Phone numbers, Credit card numbers, IDs
    • Custom Entities: Extensible detection framework
  • 🚀 Enterprise Performance: Production-optimized with minimal latency overhead

  • 🔌 Zero Dependencies: Pure Python implementation with no external requirements

  • 🛡️ End-to-End Protection: Bidirectional security for prompts and responses

  • 🎯 Universal LLM Support: Provider-agnostic with specialized optimizations

  • 📊 Conversation Memory: Intelligent caching for multi-turn dialogues

Architecture

llmshield employs a sophisticated Provider System that automatically detects and optimizes for different LLM APIs, ensuring seamless integration across platforms while maintaining enterprise-grade security and performance standards.

Provider System

The library features an extensible provider architecture that handles LLM-specific parameter formatting and API quirks:

graph TD
    A["LLMShield.ask()"] --> B["Provider Detection"]
    B --> C{LLM Function Analysis}
    C -->|"openai.chat.completions.create"| D["OpenAIProvider"]
    C -->|"openai.beta.chat.completions.parse"| D
    C -->|"Custom/Unknown APIs"| E["DefaultProvider"]

    D --> F["OpenAI Optimizations"]
    E --> G["Generic Handling"]

    F --> H["Parameter Conversion"]
    F --> I["Streaming Management"]
    F --> J["Beta API Support"]

    G --> K["Function Introspection"]
    G --> L["Best-effort Formatting"]

    H --> M["LLM Function Call"]
    I --> M
    J --> M
    K --> M
    L --> M

    M --> N["Response Processing"]
    N --> O["Entity Uncloaking"]
    O --> P["Final Response"]

Provider Types

OpenAI Provider (OpenAIProvider)

  • Automatic detection of OpenAI API functions
  • Intelligent parameter conversion (promptmessages format)
  • Streaming support with proper chunk handling
  • Beta API compatibility (structured outputs, function calling)
  • Automatic stream parameter management for APIs that don't support it

Default Provider (DefaultProvider)

  • Universal fallback for unknown LLM functions
  • Function signature introspection
  • Best-effort parameter mapping
  • Compatible with most LLM libraries (Anthropic, Cohere, Hugging Face, etc.)

Automatic Detection

The provider system automatically detects your LLM function based on:

  • Function name patterns (chat.completions.create, beta.chat.completions.parse)
  • Module paths (openai.*)
  • Function signatures and parameter inspection

Installation

Production Installation

Install llmshield from PyPI using pip:

pip install llmshield

Development Installation

For contributors and advanced users:

# Clone the repository
git clone https://github.com/yourusername/llmshield.git
cd llmshield

# Install in development mode
pip install -e .

Verification

Verify your installation:

import llmshield
print(llmshield.__version__)

Quick Start

Basic Usage

from llmshield import LLMShield

# Initialize with automatic provider detection
shield = LLMShield()

# Manual cloaking/uncloaking
cloaked_prompt, entity_map = shield.cloak(
    "Hi, I'm John Doe from Acme Corp (john.doe@acmecorp.com)"
)
print(cloaked_prompt)
# Output: "Hi, I'm <PERSON_0> from <ORG_1> (<EMAIL_2>)"

# Send to your LLM
llm_response = your_llm_function(cloaked_prompt)

# Restore original entities
original_response = shield.uncloak(llm_response, entity_map)

Direct LLM Integration

from openai import OpenAI
from llmshield import LLMShield

# OpenAI example with automatic provider detection
client = OpenAI(api_key="your-api-key")
shield = LLMShield(llm_func=client.chat.completions.create)

# Single-turn conversation
response = shield.ask(
    model="gpt-4",
    prompt="Hi, I'm Sarah Johnson (sarah.j@techcorp.com), help me write an email."
)

# Multi-turn conversation with automatic entity consistency
messages = [
    {"role": "user", "content": "I'm John Smith from DataCorp"},
    {"role": "assistant", "content": "Hello John! How can I help you today?"},
    {"role": "user", "content": "Can you email me at john@datacorp.com?"}
]

response = shield.ask(model="gpt-4", messages=messages)

Streaming Support

# Streaming with real-time entity protection
response_stream = shield.ask(
    model="gpt-4",
    prompt="Generate a report about Jane Doe (jane@example.com)",
    stream=True
)

for chunk in response_stream:
    print(chunk, end="", flush=True)

Advanced Configuration

Custom Delimiters

# Configure entity placeholder format
shield = LLMShield(
    start_delimiter='[[',  # Default: '<'
    end_delimiter=']]'     # Default: '>'
)
# Entities will appear as [[PERSON_0]], [[EMAIL_1]], etc.

Conversation Caching

# Configure cache size for multi-turn conversations
shield = LLMShield(
    llm_func=your_llm_function,
    max_cache_size=256  # Default: 128
)

Supported LLM Providers

llmshield works with any LLM library or API, with specialized optimizations for:

  • OpenAI (GPT-4, GPT-3.5, all chat models)
  • OpenAI Beta APIs (structured outputs, function calling)

Compatible (via DefaultProvider)

  • Anthropic Claude (via anthropic library)
  • Google Gemini (via google-generativeai)
  • Cohere (via cohere library)
  • Hugging Face Transformers
  • Azure OpenAI
  • AWS Bedrock
  • Any custom LLM function

Entity Detection

llmshield uses a multi-layered approach combining:

  • Dictionary-based matching for known entities
  • Pattern recognition for structured data (emails, phones, etc.)
  • Contextual analysis for proper nouns
  • Rule-based classification for entity types

Supported Entity Types

Category Examples Placeholder Format
Persons John Doe, Dr. Smith <PERSON_0>
Organizations Acme Corp, NHS <ORG_0>
Places London, Main Street <PLACE_0>
Emails user@domain.com <EMAIL_0>
Phone Numbers +1-555-0123 <PHONE_0>
URLs https://example.com <URL_0>
Credit Cards 4111-1111-1111-1111 <CREDIT_CARD_0>

Best Practices

🔒 Security Guidelines

  • Pre-transmission validation: Always verify sensitive data is properly cloaked before LLM transmission
  • Entity map security: Store entity mappings securely using encryption for persistent storage
  • Delimiter selection: Choose delimiters that don't conflict with your data format or LLM training
  • Input sanitization: Validate and sanitize all inputs before processing
  • Regular audits: Periodically review cloaked outputs to ensure no PII leakage

⚡ Performance Optimization

  • Instance reuse: Maintain single LLMShield instances to leverage conversation caching
  • Cache monitoring: Track cache hit rates for multi-turn conversations (aim for >80%)
  • Delimiter alignment: Select delimiters compatible with your LLM's tokenization
  • Batch processing: Process multiple prompts in batches when possible
  • Memory management: Configure appropriate cache sizes based on usage patterns

🔧 Integration Best Practices

  • Exception handling: Implement comprehensive error handling for ValueError exceptions
  • Provider testing: Validate functionality with your specific LLM provider before production
  • Structured outputs: Leverage structured output capabilities for complex response processing
  • Monitoring: Implement logging and monitoring for entity detection accuracy
  • Testing: Include PII protection tests in your CI/CD pipeline

Language Support

  • Primary: English (optimized)
  • Secondary: Spanish (good accuracy)
  • Experimental: Other languages (reduced accuracy, potential PII leakage)

Requirements

  • Python: 3.10+
  • Dependencies: None (zero-dependency architecture)
  • Memory: Minimal footprint with efficient caching
  • Performance: Sub-millisecond entity detection for typical prompts

Development

Setup

# Clone and setup development environment
git clone https://github.com/yourusername/llmshield.git
cd llmshield

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install development dependencies
make dev-dependencies

Testing

# Run full test suite
make tests

# Run with coverage
make coverage

# Test specific providers (requires API keys)
OPENAI_API_KEY=your-key python -m unittest tests/providers/test_openai.py

Code Quality

# Format code
black llmshield/ tests/
isort llmshield/ tests/

# Lint
flake8 llmshield/ tests/

Building and Distribution

Local Installation

# Install package locally in development mode
pip install -e .

# Or install from local build
python -m build
pip install dist/llmshield-*.whl

Building for Distribution

# Install build dependencies
pip install build twine

# Build the package
python -m build

# This creates:
# - dist/llmshield-*.tar.gz (source distribution)
# - dist/llmshield-*.whl (wheel distribution)

Publishing to PyPI

# Check the built package
twine check dist/*

# Upload to test PyPI first (recommended)
twine upload --repository testpypi dist/*

# Upload to production PyPI
twine upload dist/*

Prerequisites for publishing:

  • PyPI account with API key configured
  • Maintainer permissions on the llmshield project
  • All tests passing and version bumped in pyproject.toml

Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

Development Principles

  • Zero Dependencies: Maintain pure Python implementation
  • Performance First: Optimize for production workloads
  • Security Focus: Prioritize data protection and privacy
  • Universal Compatibility: Support all major LLM providers

License

GNU Affero General Public License v3.0 - See LICENSE.txt for details.

Production Usage

llmshield is trusted by:


Get Started

Ready to secure your LLM interactions? Install llmshield today:

pip install llmshield

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmshield-0.0.7.tar.gz (415.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llmshield-0.0.7-py3-none-any.whl (390.0 kB view details)

Uploaded Python 3

File details

Details for the file llmshield-0.0.7.tar.gz.

File metadata

  • Download URL: llmshield-0.0.7.tar.gz
  • Upload date:
  • Size: 415.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for llmshield-0.0.7.tar.gz
Algorithm Hash digest
SHA256 ef8df5d1aea6048dfd4fd980d5d42c25c6aa89266da6cfc7843f1abb88dfe733
MD5 37d3f5365e9a1a749be88d2453b585d9
BLAKE2b-256 c978a5fdefb01a913f4d6e2ad3352d91c846238dee6af8c70608160d584f9f88

See more details on using hashes here.

File details

Details for the file llmshield-0.0.7-py3-none-any.whl.

File metadata

  • Download URL: llmshield-0.0.7-py3-none-any.whl
  • Upload date:
  • Size: 390.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for llmshield-0.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 4ecd16dda730a1ea0faf1a34f257e44fe0530954378caf30c6eb4663232b00f9
MD5 32bbd918c5ffc41a0541d76bf77a95a2
BLAKE2b-256 acf8d46bcf2cf5cb97ec33fa9c5226bf754073032fe7ebc9dfe97b3b36dc0bf7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page