Shields your confidential data from third party LLM providers

These details have not been verified by PyPI

Project links

Project description

LLMShield

A zero-dependency Python library for protecting personally identifiable information (PII) in Large Language Model interactions through automatic entity detection and cloaking.

Overview

LLMShield provides enterprise-grade protection for sensitive information in LLM interactions by automatically detecting and replacing PII with placeholders before transmission, then restoring original values in responses. The library employs a multi-layered detection approach combining pattern recognition, dictionary matching, and contextual analysis.

Key Features:

Zero external dependencies
Automatic entity detection and cloaking
Multi-turn conversation support with entity consistency
Streaming response handling
Universal LLM provider compatibility
Production-optimized performance

High-Level Data Flow

graph LR
    A["Raw Input<br/>'Contact Dr. Smith at smith@hospital.org'"] --> B["Entity Detection<br/>PERSON: Dr. Smith<br/>EMAIL: smith@hospital.org"]

    B --> C["PII Anonymization<br/>'Contact &lt;PERSON_0&gt; at &lt;EMAIL_1&gt;'"]

    C --> D["LLM Processing<br/>Safe text sent to<br/>OpenAI, Claude, etc."]

    D --> E["Response Restoration<br/>Placeholders → Original PII"]

    E --> F["Protected Output<br/>'I'll help you contact Dr. Smith<br/>at smith@hospital.org'"]

    %% Styling
    classDef flowStyle fill:#f8f9fa,stroke:#495057,stroke-width:2px,color:#212529
    classDef detectionStyle fill:#e8f4f8,stroke:#0c63e4,stroke-width:2px,color:#212529
    classDef anonymizationStyle fill:#fff3cd,stroke:#856404,stroke-width:2px,color:#212529
    classDef llmStyle fill:#f0e6ff,stroke:#6f42c1,stroke-width:2px,color:#212529
    classDef restorationStyle fill:#d1ecf1,stroke:#0c5460,stroke-width:2px,color:#212529

    class A flowStyle
    class B detectionStyle
    class C anonymizationStyle
    class D llmStyle
    class E restorationStyle
    class F flowStyle

Under the Hood: System Architecture

graph LR
    subgraph Input ["Input Layer"]
        A["Textual Input<br/>Contains PII Entities"]
    end

    subgraph Detection ["Entity Detection Engine"]
        B["Waterfall Detection Algorithm<br/>• Phase 1: Pattern Recognition (RegEx)<br/>• Phase 2: Numerical Validation (Luhn)<br/>• Phase 3: Linguistic Analysis (NLP)<br/>Entity Types: EMAIL, PERSON, PHONE, URL"]
    end

    subgraph Cloaking ["Entity Anonymization"]
        C["Classification & Tokenization<br/>PII → Typed Placeholders<br/>Deterministic Mapping<br/>Format: &lt;TYPE_INDEX&gt;"]
    end

    subgraph Provider ["LLM Provider Interface"]
        D["Provider-Agnostic API Gateway<br/>Supported: OpenAI, Anthropic Claude,<br/>Google Gemini, Azure OpenAI,<br/>AWS Bedrock, Custom Endpoints"]
    end

    subgraph Restoration ["Entity De-anonymization"]
        E["Inverse Token Mapping<br/>Placeholder Detection<br/>Bidirectional Text Reconstruction<br/>Integrity Preservation"]
    end

    subgraph Output ["Output Layer"]
        F["Reconstructed Response<br/>Original PII Restored<br/>Stream-Compatible"]
    end

    subgraph Memory ["State Management System"]
        G["LRU Cache Implementation<br/>Conversation Continuity<br/>Hash-Based Entity Mapping<br/>O(1) Lookup Complexity"]
    end

    %% Primary data flow
    A --> B
    B --> C
    C --> D
    D --> E
    E --> F

    %% State management interactions
    Memory -.->|"Read/Write<br/>Entity Maps"| C
    Memory -.->|"Consistency<br/>Validation"| E

    %% Styling
    classDef inputStyle fill:#f8f9fa,stroke:#495057,stroke-width:2px,color:#212529
    classDef detectionStyle fill:#e8f4f8,stroke:#0c63e4,stroke-width:2px,color:#212529
    classDef cloakingStyle fill:#fff3cd,stroke:#856404,stroke-width:2px,color:#212529
    classDef providerStyle fill:#f0e6ff,stroke:#6f42c1,stroke-width:2px,color:#212529
    classDef restorationStyle fill:#d1ecf1,stroke:#0c5460,stroke-width:2px,color:#212529
    classDef outputStyle fill:#d4edda,stroke:#155724,stroke-width:2px,color:#212529
    classDef memoryStyle fill:#f8d7da,stroke:#721c24,stroke-width:2px,color:#212529

    class A inputStyle
    class B detectionStyle
    class C cloakingStyle
    class D providerStyle
    class E restorationStyle
    class F outputStyle
    class G memoryStyle

Installation

pip install llmshield

Quick Start

Basic Usage

from llmshield import LLMShield

# Initialise shield
shield = LLMShield()

# Protect sensitive information
cloaked_prompt, entity_map = shield.cloak(
    "Contact John Doe at john.doe@company.com or call +1-555-0123"
)
print(cloaked_prompt)
# Output: "Contact <PERSON_0> at <EMAIL_1> or call <PHONE_NUMBER_2>"

# Process with LLM
llm_response = your_llm_function(cloaked_prompt)

# Restore original entities
restored_response = shield.uncloak(llm_response, entity_map)

Note: Individual cloak() and uncloak() methods support single messages only and do not maintain conversation history. For multi-turn conversations with entity consistency across messages, use the ask() method.

Direct LLM Integration

from openai import OpenAI
from llmshield import LLMShield

client = OpenAI(api_key="your-api-key")
shield = LLMShield(llm_func=client.chat.completions.create)

# Single request with automatic protection
response = shield.ask(
    model="gpt-4",
    prompt="Draft an email to Sarah Johnson at sarah.j@techcorp.com"
)

# Multi-turn conversation
messages = [
    {"role": "user", "content": "I'm John Smith from DataCorp"},
    {"role": "assistant", "content": "Hello! How can I help you?"},
    {"role": "user", "content": "Email me at john@datacorp.com"}
]

response = shield.ask(model="gpt-4", messages=messages)

Streaming Support

response_stream = shield.ask(
    model="gpt-4",
    prompt="Generate a report about Jane Doe (jane@example.com)",
    stream=True
)

for chunk in response_stream:
    print(chunk, end="", flush=True)

Entity Detection

The library detects and protects the following entity types:

Entity Type	Examples	Placeholder Format
Person	John Doe, Dr. Smith	`<PERSON_0>`
Organization	Acme Corp, NHS	`<ORGANISATION_0>`
Place	London, Main Street	`<PLACE_0>`
Email	user@domain.com	`<EMAIL_0>`
Phone	+1-555-0123	`<PHONE_NUMBER_0>`
URL	https://example.com	`<URL_0>`
Credit Card	4111-1111-1111-1111	`<CREDIT_CARD_0>`
IP Address	192.168.1.1	`<IP_ADDRESS_0>`

Configuration

Custom Delimiters

shield = LLMShield(
    start_delimiter='[[',
    end_delimiter=']]'
)
# Entities appear as [[PERSON_0]], [[EMAIL_1]], etc.

Conversation Caching

LLMShield implements an LRU (Least Recently Used) cache to maintain entity consistency across multi-turn conversations. The cache stores entity mappings for conversation histories, ensuring that all entities (persons, organizations, emails, phones, etc.) mentioned in different messages receive the same placeholders.

shield = LLMShield(
    llm_func=your_llm_function,
    max_cache_size=1000  # Default: 1000
)

Cache Sizing Guidelines:

Small applications (< 100 daily conversations): max_cache_size=100-500
Medium applications (100-1000 daily conversations): max_cache_size=1000-5000
Large applications (> 1000 daily conversations): max_cache_size=5000-10000

Memory considerations: Each cache entry stores approximately 1-2KB. A cache size of 1000 uses roughly 1-2MB of memory.

Performance impact: Cache hit rates above 80% significantly improve performance for multi-turn conversations by avoiding re-detection of previously seen entities.

Provider Compatibility

LLMShield includes specialized support for OpenAI APIs and universal compatibility with other providers:

Optimized Support:

OpenAI (GPT-4, GPT-3.5, all chat models)
OpenAI Beta APIs (structured outputs)

Universal Compatibility:

Anthropic Claude
Google Gemini
Cohere
Hugging Face Transformers
Azure OpenAI
AWS Bedrock
Custom LLM functions

Language Support

English: Full optimization and accuracy
Other languages: Experimental support with reduced accuracy

Requirements

Python 3.12+
Zero external dependencies

Development

Setup

git clone https://github.com/yourusername/llmshield.git
cd llmshield
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
pip install -e ".[dev]"

Testing

# Run tests
make tests

# Coverage analysis
make coverage

# Code quality checks
make ruff

# Documentation coverage
make doc-coverage

Building and Publishing

Building the Package

# Install build dependencies
make dev-dependencies

# Build the package
make build

Publishing to PyPI

Update version in pyproject.toml
Run quality checks:
```
make tests
make coverage
make ruff
```
Build and publish:
```
make build
twine upload dist/*
```

Security Considerations

Validate cloaked outputs before LLM transmission
Securely store entity mappings for persistent sessions
Choose delimiters that don't conflict with your data format
Implement comprehensive input validation
Regularly audit entity detection accuracy

Contributing

See CONTRIBUTING.md for development guidelines and contribution process.

License

GNU Affero General Public License v3.0 - See LICENSE.txt for details.

Production Usage

LLMShield is used in production at brainful.ai.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

2.1.0

Mar 5, 2026

2.0.0

Feb 4, 2026

1.0.0

Jul 2, 2025

0.0.9

Jun 29, 2025

This version

0.0.8

Jun 29, 2025

0.0.7

Jun 27, 2025

0.0.6

Jun 18, 2025

0.0.5

Apr 5, 2025

0.0.4

Apr 5, 2025

0.0.3

Feb 2, 2025

0.0.2

Feb 2, 2025

0.0.1

Feb 2, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmshield-0.0.8.tar.gz (412.6 kB view details)

Uploaded Jun 29, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llmshield-0.0.8-py3-none-any.whl (389.1 kB view details)

Uploaded Jun 29, 2025 Python 3

File details

Details for the file llmshield-0.0.8.tar.gz.

File metadata

Download URL: llmshield-0.0.8.tar.gz
Upload date: Jun 29, 2025
Size: 412.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for llmshield-0.0.8.tar.gz
Algorithm	Hash digest
SHA256	`066f0af12ddafffdb7dc0b2631d7b6886662b7a6dc67cdb839727a421a470658`
MD5	`ff765ffe039419580800ed8d4949902d`
BLAKE2b-256	`9c9c8505a4591691520d0c5786b5010a896d8fbed14d971ef347179d354d993c`

See more details on using hashes here.

File details

Details for the file llmshield-0.0.8-py3-none-any.whl.

File metadata

Download URL: llmshield-0.0.8-py3-none-any.whl
Upload date: Jun 29, 2025
Size: 389.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for llmshield-0.0.8-py3-none-any.whl
Algorithm	Hash digest
SHA256	`16f8eee23e3e58fc4f7673a98d8033666855660fd48ed680dfd9bbe27b463ebd`
MD5	`1780d1d77b804cbdc832b1393d583957`
BLAKE2b-256	`e3087a91794143be6c5988ea1335913971c21ee42a6024fd14292350e7938e9b`

See more details on using hashes here.

llmshield 0.0.8

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

LLMShield

Overview

High-Level Data Flow

Under the Hood: System Architecture

Installation

Quick Start

Basic Usage

Direct LLM Integration

Streaming Support

Entity Detection

Configuration

Custom Delimiters

Conversation Caching

Provider Compatibility

Language Support

Requirements

Development

Setup

Testing

Building and Publishing

Building the Package

Publishing to PyPI

Security Considerations

Contributing

License

Production Usage

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes