Shields your confidential data from third party LLM providers

These details have not been verified by PyPI

Project links

Project description

llmshield

Enterprise-grade PII protection for Large Language Model interactions

Overview
Architecture
Installation
Quick Start
Advanced Configuration
- Custom Delimiters
- Conversation Caching
Supported LLM Providers
Entity Detection
- Supported Entity Types
Best Practices
Language Support
Requirements
Development
Contributing
License
Production Usage

Overview

llmshield is a production-ready, zero-dependency Python library engineered for enterprise-grade protection of sensitive information in Large Language Model (LLM) interactions. It provides automatic detection and cloaking of personally identifiable information (PII) and sensitive entities, ensuring data privacy compliance while maintaining seamless LLM integration.

Built for accuracy and performance, llmshield employs a sophisticated multi-layered approach combining dictionary-based matching, pattern recognition, contextual analysis, and rule-based classification to achieve maximum entity detection precision.

🔒 Comprehensive Entity Detection: Advanced multi-layered detection system
- Proper Nouns: Persons, Places, Organizations, Concepts
- Contact Information: Email addresses, URLs, social handles
- Numeric Data: Phone numbers, Credit card numbers, IDs
- Custom Entities: Extensible detection framework
🚀 Enterprise Performance: Production-optimized with minimal latency overhead
🔌 Zero Dependencies: Pure Python implementation with no external requirements
🛡️ End-to-End Protection: Bidirectional security for prompts and responses
🎯 Universal LLM Support: Provider-agnostic with specialized optimizations
📊 Conversation Memory: Intelligent caching for multi-turn dialogues

Architecture

llmshield employs a sophisticated Provider System that automatically detects and optimizes for different LLM APIs, ensuring seamless integration across platforms while maintaining enterprise-grade security and performance standards.

Provider System

The library features an extensible provider architecture that handles LLM-specific parameter formatting and API quirks:

graph TD
    A["LLMShield.ask()"] --> B["Provider Detection"]
    B --> C{LLM Function Analysis}
    C -->|"openai.chat.completions.create"| D["OpenAIProvider"]
    C -->|"openai.beta.chat.completions.parse"| D
    C -->|"Custom/Unknown APIs"| E["DefaultProvider"]

    D --> F["OpenAI Optimizations"]
    E --> G["Generic Handling"]

    F --> H["Parameter Conversion"]
    F --> I["Streaming Management"]
    F --> J["Beta API Support"]

    G --> K["Function Introspection"]
    G --> L["Best-effort Formatting"]

    H --> M["LLM Function Call"]
    I --> M
    J --> M
    K --> M
    L --> M

    M --> N["Response Processing"]
    N --> O["Entity Uncloaking"]
    O --> P["Final Response"]

Provider Types

OpenAI Provider (OpenAIProvider)

Automatic detection of OpenAI API functions
Intelligent parameter conversion (prompt → messages format)
Streaming support with proper chunk handling
Beta API compatibility (structured outputs, function calling)
Automatic stream parameter management for APIs that don't support it

Default Provider (DefaultProvider)

Universal fallback for unknown LLM functions
Function signature introspection
Best-effort parameter mapping
Compatible with most LLM libraries (Anthropic, Cohere, Hugging Face, etc.)

Automatic Detection

The provider system automatically detects your LLM function based on:

Function name patterns (chat.completions.create, beta.chat.completions.parse)
Module paths (openai.*)
Function signatures and parameter inspection

Installation

Production Installation

Install llmshield from PyPI using pip:

pip install llmshield

Development Installation

For contributors and advanced users:

# Clone the repository
git clone https://github.com/yourusername/llmshield.git
cd llmshield

# Install in development mode
pip install -e .

Verification

Verify your installation:

import llmshield
print(llmshield.__version__)

Quick Start

Basic Usage

from llmshield import LLMShield

# Initialize with automatic provider detection
shield = LLMShield()

# Manual cloaking/uncloaking
cloaked_prompt, entity_map = shield.cloak(
    "Hi, I'm John Doe from Acme Corp (john.doe@acmecorp.com)"
)
print(cloaked_prompt)
# Output: "Hi, I'm <PERSON_0> from <ORG_1> (<EMAIL_2>)"

# Send to your LLM
llm_response = your_llm_function(cloaked_prompt)

# Restore original entities
original_response = shield.uncloak(llm_response, entity_map)

Direct LLM Integration

from openai import OpenAI
from llmshield import LLMShield

# OpenAI example with automatic provider detection
client = OpenAI(api_key="your-api-key")
shield = LLMShield(llm_func=client.chat.completions.create)

# Single-turn conversation
response = shield.ask(
    model="gpt-4",
    prompt="Hi, I'm Sarah Johnson (sarah.j@techcorp.com), help me write an email."
)

# Multi-turn conversation with automatic entity consistency
messages = [
    {"role": "user", "content": "I'm John Smith from DataCorp"},
    {"role": "assistant", "content": "Hello John! How can I help you today?"},
    {"role": "user", "content": "Can you email me at john@datacorp.com?"}
]

response = shield.ask(model="gpt-4", messages=messages)

Streaming Support

# Streaming with real-time entity protection
response_stream = shield.ask(
    model="gpt-4",
    prompt="Generate a report about Jane Doe (jane@example.com)",
    stream=True
)

for chunk in response_stream:
    print(chunk, end="", flush=True)

Advanced Configuration

Custom Delimiters

# Configure entity placeholder format
shield = LLMShield(
    start_delimiter='[[',  # Default: '<'
    end_delimiter=']]'     # Default: '>'
)
# Entities will appear as [[PERSON_0]], [[EMAIL_1]], etc.

Conversation Caching

# Configure cache size for multi-turn conversations
shield = LLMShield(
    llm_func=your_llm_function,
    max_cache_size=256  # Default: 128
)

Supported LLM Providers

llmshield works with any LLM library or API, with specialized optimizations for:

OpenAI (GPT-4, GPT-3.5, all chat models)
OpenAI Beta APIs (structured outputs, function calling)

Compatible (via DefaultProvider)

Anthropic Claude (via anthropic library)
Google Gemini (via google-generativeai)
Cohere (via cohere library)
Hugging Face Transformers
Azure OpenAI
AWS Bedrock
Any custom LLM function

Entity Detection

llmshield uses a multi-layered approach combining:

Dictionary-based matching for known entities
Pattern recognition for structured data (emails, phones, etc.)
Contextual analysis for proper nouns
Rule-based classification for entity types

Supported Entity Types

Category	Examples	Placeholder Format
Persons	John Doe, Dr. Smith	`<PERSON_0>`
Organizations	Acme Corp, NHS	`<ORG_0>`
Places	London, Main Street	`<PLACE_0>`
Emails	user@domain.com	`<EMAIL_0>`
Phone Numbers	+1-555-0123	`<PHONE_0>`
URLs	https://example.com	`<URL_0>`
Credit Cards	4111-1111-1111-1111	`<CREDIT_CARD_0>`

Best Practices

🔒 Security Guidelines

Pre-transmission validation: Always verify sensitive data is properly cloaked before LLM transmission
Entity map security: Store entity mappings securely using encryption for persistent storage
Delimiter selection: Choose delimiters that don't conflict with your data format or LLM training
Input sanitization: Validate and sanitize all inputs before processing
Regular audits: Periodically review cloaked outputs to ensure no PII leakage

⚡ Performance Optimization

Instance reuse: Maintain single LLMShield instances to leverage conversation caching
Cache monitoring: Track cache hit rates for multi-turn conversations (aim for >80%)
Delimiter alignment: Select delimiters compatible with your LLM's tokenization
Batch processing: Process multiple prompts in batches when possible
Memory management: Configure appropriate cache sizes based on usage patterns

🔧 Integration Best Practices

Exception handling: Implement comprehensive error handling for ValueError exceptions
Provider testing: Validate functionality with your specific LLM provider before production
Structured outputs: Leverage structured output capabilities for complex response processing
Monitoring: Implement logging and monitoring for entity detection accuracy
Testing: Include PII protection tests in your CI/CD pipeline

Language Support

Primary: English (optimized)
Secondary: Spanish (good accuracy)
Experimental: Other languages (reduced accuracy, potential PII leakage)

Requirements

Python: 3.10+
Dependencies: None (zero-dependency architecture)
Memory: Minimal footprint with efficient caching
Performance: Sub-millisecond entity detection for typical prompts

Development

Setup

# Clone and setup development environment
git clone https://github.com/yourusername/llmshield.git
cd llmshield

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install development dependencies
make dev-dependencies

Testing

# Run full test suite
make tests

# Run with coverage
make coverage

# Test specific providers (requires API keys)
OPENAI_API_KEY=your-key python -m unittest tests/providers/test_openai.py

Code Quality

# Format code
black llmshield/ tests/
isort llmshield/ tests/

# Lint
flake8 llmshield/ tests/

Building and Distribution

Local Installation

# Install package locally in development mode
pip install -e .

# Or install from local build
python -m build
pip install dist/llmshield-*.whl

Building for Distribution

# Install build dependencies
pip install build twine

# Build the package
python -m build

# This creates:
# - dist/llmshield-*.tar.gz (source distribution)
# - dist/llmshield-*.whl (wheel distribution)

Publishing to PyPI

# Check the built package
twine check dist/*

# Upload to test PyPI first (recommended)
twine upload --repository testpypi dist/*

# Upload to production PyPI
twine upload dist/*

Prerequisites for publishing:

PyPI account with API key configured
Maintainer permissions on the llmshield project
All tests passing and version bumped in pyproject.toml

Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

Development Principles

Zero Dependencies: Maintain pure Python implementation
Performance First: Optimize for production workloads
Security Focus: Prioritize data protection and privacy
Universal Compatibility: Support all major LLM providers

License

GNU Affero General Public License v3.0 - See LICENSE.txt for details.

Production Usage

llmshield is trusted by:

brainful.ai - AI-powered enterprise solutions

Get Started

Ready to secure your LLM interactions? Install llmshield today:

pip install llmshield

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

2.1.0

Mar 5, 2026

2.0.0

Feb 4, 2026

1.0.0

Jul 2, 2025

0.0.9

Jun 29, 2025

0.0.8

Jun 29, 2025

This version

0.0.7

Jun 27, 2025

0.0.6

Jun 18, 2025

0.0.5

Apr 5, 2025

0.0.4

Apr 5, 2025

0.0.3

Feb 2, 2025

0.0.2

Feb 2, 2025

0.0.1

Feb 2, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmshield-0.0.7.tar.gz (415.0 kB view details)

Uploaded Jun 27, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llmshield-0.0.7-py3-none-any.whl (390.0 kB view details)

Uploaded Jun 27, 2025 Python 3

File details

Details for the file llmshield-0.0.7.tar.gz.

File metadata

Download URL: llmshield-0.0.7.tar.gz
Upload date: Jun 27, 2025
Size: 415.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for llmshield-0.0.7.tar.gz
Algorithm	Hash digest
SHA256	`ef8df5d1aea6048dfd4fd980d5d42c25c6aa89266da6cfc7843f1abb88dfe733`
MD5	`37d3f5365e9a1a749be88d2453b585d9`
BLAKE2b-256	`c978a5fdefb01a913f4d6e2ad3352d91c846238dee6af8c70608160d584f9f88`

See more details on using hashes here.

File details

Details for the file llmshield-0.0.7-py3-none-any.whl.

File metadata

Download URL: llmshield-0.0.7-py3-none-any.whl
Upload date: Jun 27, 2025
Size: 390.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for llmshield-0.0.7-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4ecd16dda730a1ea0faf1a34f257e44fe0530954378caf30c6eb4663232b00f9`
MD5	`32bbd918c5ffc41a0541d76bf77a95a2`
BLAKE2b-256	`acf8d46bcf2cf5cb97ec33fa9c5226bf754073032fe7ebc9dfe97b3b36dc0bf7`

See more details on using hashes here.

llmshield 0.0.7

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

llmshield

Table of Contents

Overview

Architecture

Provider System

Provider Types

Automatic Detection

Installation

Production Installation

Development Installation

Verification

Quick Start

Basic Usage

Direct LLM Integration

Streaming Support

Advanced Configuration

Custom Delimiters

Conversation Caching

Supported LLM Providers

Compatible (via DefaultProvider)

Entity Detection

Supported Entity Types

Best Practices

🔒 Security Guidelines

⚡ Performance Optimization

🔧 Integration Best Practices

Language Support

Requirements

Development

Setup

Testing

Code Quality

Building and Distribution

Local Installation

Building for Distribution

Publishing to PyPI

Contributing

Development Principles

License

Production Usage

Get Started

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes