Cognitive Security Middleware - The 'Electronic Stability Program' (ESP) for Large Language Models. Bidirectional containment system with defense-in-depth architecture (6 validation layers), stateful tracking, and mathematical safety constraints. Validated against Unicode/encoding attacks, pattern evasion, multilingual/polyglot attacks (12+ languages including Basque, Maltese), and memory/session attacks. Protocol-based hexagonal architecture with LangChain integration.
Project description
LLM Security Firewall
Bidirectional security framework for human/LLM interfaces implementing defense-in-depth architecture with multiple validation layers.
Version: 2.5.0 Python: >=3.12 License: MIT Status: Production
TL;DR
A bidirectional security layer for LLM-based systems. Validates input, output, and agent state transitions.
- Input, output, memory, and state validation
- Supports agent frameworks, tool-using models, and API gateways
- Detection: prompt injection, jailbreaks, obfuscation, evasive encodings, unauthorized tool actions, cognitive leak risks, children-safety failures, malformed JSON, memory-poisoning
- Local execution: no telemetry, no external calls
- API:
guard.check_input(text)/guard.check_output(text)
Minimal Example
from llm_firewall import guard
# Example: Input validation. The v2.4.1 update reduced false positives for
# benign educational queries matching the 'explain how...' pattern.
user_prompt = "Explain how rain forms."
decision = guard.check_input(user_prompt)
print(f"Blocked: {not decision.allowed}, Reason: {decision.reason}")
if decision.allowed:
# LLM backend call
response = llm(user_prompt)
out = guard.check_output(response)
if not out.allowed:
print(f"Response sanitized: {out.reason}")
else:
print(f"LLM Output: {out.cleaned_text}")
Installation
Core Installation (Recommended - ~54 MB baseline)
Lightweight baseline with ONNX-only inference:
pip install llm-security-firewall
# OR
pip install -r requirements-core.txt
This provides:
- Pattern matching and basic validation
- ONNX-based semantic guard (CUDA-enabled)
- Memory footprint: ~54 MB (96% reduction from original 1.3 GB)
Full ML Capabilities (Optional - ~1.3 GB when loaded)
For advanced validators (TruthPreservationValidator, TopicFence):
pip install llm-security-firewall[full]
# OR
pip install -r requirements.txt
Heavy components (PyTorch, transformers) are loaded on-demand only - they don't affect the baseline.
Development Installation
For development (local installation):
pip install -e .
For development dependencies:
pip install -e .[dev]
Optional extras:
pip install llm-security-firewall[langchain] # LangChain integration
pip install llm-security-firewall[dev] # Development tools
pip install llm-security-firewall[monitoring] # Monitoring tools
Features
- Bidirectional validation: Input, output, and memory integrity validation
- Sequential validation layers: UnicodeSanitizer, NormalizationLayer, RegexGate, Input Analysis, Tool Inspection, Output Validation
- Statistical methods: CUSUM for drift detection, Dempster-Shafer theory for evidence fusion, fail-closed risk gating
- Multilingual detection: Polyglot attack detection across 12+ languages including low-resource languages (Basque, Maltese tested)
- Unicode normalization: Zero-width character removal, bidirectional override detection, homoglyph normalization, encoding anomaly detection
- Session state tracking: Session state management, drift detection, cumulative risk tracking
- Tool call validation: HEPHAESTUS protocol for tool call validation and killchain detection
- Published metrics: False Positive Rate (FPR), P99 latency, and memory usage documented in
/docs/ - Hexagonal architecture: Protocol-based adapters for framework independence
Architecture
Architectural Approach
The system implements a stateful, bidirectional containment mechanism for large language models. Requests are processed through sequential validation layers with mathematical constraints and stateful tracking.
Architectural principles:
- Bidirectional validation: All data paths validated (input, output, in-memory state transitions)
- Hexagonal architecture: Protocol-based Port/Adapter interfaces with dependency injection
- Domain separation: Core business logic separated from infrastructure concerns
- Framework independence: Domain layer uses Protocol-based adapters (
DecisionCachePort,DecoderPort,ValidatorPort) - Deterministic normalization: Multi-pass Unicode normalization, homoglyph resolution, Base64/Hex/URL decoding, JSON-hardening
- Statistical methods: CUSUM detectors for oscillation detection, Dempster-Shafer uncertainty modeling for evidence fusion, fail-closed risk gating
- Stateful protection: Policies operate on text, agent state, tool sequences, and memory mutation events
- Local execution: No telemetry, no external API calls, no data exfiltration
Bidirectional Processing Pipeline
The system operates in three directions:
-
Human → LLM (Input Protection)
- Normalization and sanitization
- Pattern matching and evasion detection
- Risk scoring and policy evaluation
- Session state tracking
-
LLM → Human (Output Protection)
- Evidence validation
- Tool call validation
- Output sanitization
- Truth preservation checks
-
Memory Integrity
- Session state management
- Drift detection
- Influence tracking
Core Components
Firewall Engine (src/llm_firewall/core/firewall_engine_v2.py)
- Main decision engine
- Risk score aggregation
- Policy application
- Unicode security analysis
Normalization Layer (src/hak_gal/layers/inbound/normalization_layer.py)
- Recursive URL/percent decoding
- Unicode normalization (NFKC)
- Zero-width character removal
- Directional override character removal
Pattern Matching (src/llm_firewall/rules/patterns.py)
- Regex-based pattern detection
- Concatenation-aware matching
- Evasion pattern detection
Risk Scoring (src/llm_firewall/core/risk_scorer.py)
- Multi-factor risk calculation
- Cumulative risk tracking
- Threshold-based decisions
Cache System (src/llm_firewall/cache/decision_cache.py)
- Exact match caching (Redis)
- Semantic caching (LangCache)
- Hybrid mode support
- Circuit breaker pattern
- Fail-safe behavior (blocks on cache failure, prevents security bypass)
Adapter Health (src/llm_firewall/core/adapter_health.py)
- Circuit breaker implementation
- Health metrics tracking
- Failure threshold management
- Recovery timeout handling
Developer Adoption API (src/llm_firewall/guard.py)
- API:
guard.check_input(text),guard.check_output(text) - Backward compatible with existing API
- Integration guide:
QUICKSTART.md
LangChain Integration (src/llm_firewall/integrations/langchain/callbacks.py)
FirewallCallbackHandlerfor LangChain chains- Automatic input/output validation
- See
examples/langchain_integration.pyfor usage
Configuration
Cache Modes
Configure via CACHE_MODE environment variable:
exact(default): Redis exact match cachesemantic: LangCache semantic searchhybrid: Both caches in sequence
Redis Configuration
export REDIS_URL=redis://:password@host:6379/0
export REDIS_TTL=3600 # Optional: Cache TTL in seconds
For Redis Cloud:
export REDIS_CLOUD_HOST=host
export REDIS_CLOUD_PORT=port
export REDIS_CLOUD_USERNAME=username
export REDIS_CLOUD_PASSWORD=password
Examples
Integration examples in examples/ directory:
quickstart.py- Basic integration usingguard.pyAPIlangchain_integration.py- LangChain integration withFirewallCallbackHandlerminimal_fastapi.py- FastAPI middleware integrationquickstart_fastapi.py- FastAPI example with input/output validation
Run examples:
python examples/quickstart.py
python examples/langchain_integration.py
python examples/minimal_fastapi.py
Basic usage:
from llm_firewall import guard
decision = guard.check_input("user input text")
if decision.allowed:
# Process request
pass
Testing
Test suite includes unit tests, integration tests, and adversarial test cases.
pytest tests/ -v
With coverage:
pytest tests/ -v --cov=src/llm_firewall --cov-report=term
Dependencies
Core (Required for basic functionality):
- numpy>=1.24.0
- scipy>=1.11.0
- scikit-learn>=1.3.0
- pyyaml>=6.0
- blake3>=0.3.0
- requests>=2.31.0
- psycopg[binary]>=3.1.0
- redis>=5.0.0
- pydantic>=2.0.0
- psutil>=5.9.0
- cryptography>=41.0.0
Machine Learning (Optional, for advanced features):
- sentence-transformers>=2.2.0 (SemanticVectorCheck, embedding-based detection)
- torch>=2.0.0 (ML model inference)
- transformers>=4.30.0 (Transformer-based detectors)
- onnx>=1.14.0 (ONNX model support)
- onnxruntime>=1.16.0 (ONNX runtime)
Note: Core functionality (Unicode normalization, pattern matching, risk scoring, basic validation) operates without ML dependencies. Semantic similarity detection and Kids Policy Engine require optional ML dependencies.
System Requirements:
- Python >=3.12 (by design, no legacy support)
- RAM: ~300MB for core functionality, ~1.3GB for adversarial inputs with full ML features
- GPU: Optional, only required for certain ML-based detectors
- Redis: Optional but recommended for caching (local or cloud)
Known Limitations
- False Positive Rate: Kids Policy false positive rate is 0.00% on validation dataset (target: ≤5.0%, met in v2.4.1)
- Memory Usage: Current memory usage exceeds 300MB cap for adversarial inputs (measured: ~1.3GB)
- Unicode Normalization: Some edge cases in mathematical alphanumeric symbol handling
- Python Version: Requires Python >=3.12 (by design, no legacy support for 3.10/3.11)
- Dependencies: Core functionality requires numpy, scipy, scikit-learn; full ML features require torch, transformers, sentence-transformers (see Dependencies section)
Security Notice
This library reduces risk but does not guarantee complete protection.
Required additional security controls:
- Authentication and authorization
- Network isolation
- Logging and monitoring
- Rate limiting
- Sandboxing of tool environments
The maintainers assume no liability for misuse.
Use only in compliance with local law and data-protection regulations.
Security Hardening
Implemented Measures
-
Multi-Tenant Isolation
- Session hashing via HMAC-SHA256(tenant_id + user_id + DAILY_SALT)
- Redis key isolation via ACLs and prefixes
-
Oscillation Defense
- CUSUM (Cumulative Sum Control Chart) algorithm
- Accumulative risk tracking across session turns
-
Parser Differential Protection
- StrictJSONDecoder with duplicate key detection
- Immediate exception on key duplication
-
Unicode Security
- Zero-width character detection and removal
- Directional override character detection
- Homoglyph normalization
-
Multilingual Attack Detection
- Polyglot attack detection across 12+ languages
- Low-resource language hardening (Basque, Maltese tested)
- Language switching detection
- Multilingual keyword detection (Chinese, Japanese, Russian, Arabic, Hindi, Korean, and others)
-
Pattern Evasion Detection
- Concatenation-aware pattern matching
- Encoding anomaly detection
- Obfuscation pattern recognition
Performance Characteristics
- P99 Latency: <200ms for standard inputs (measured)
- Cache Hit Rate: 30-50% (exact), 70-90% (hybrid)
- Cache Latency: <100ms (Redis Cloud), <1ms (local Redis)
Monitoring
MCP monitoring tools available for health checks and metrics:
firewall_health_check: Redis/Session health inspectionfirewall_deployment_status: Traffic percentage and rollout phasefirewall_metrics: Real-time block rates and CUSUM scoresfirewall_check_alerts: Critical P0 alertsfirewall_redis_status: ACL and connection pool health
Implementation Status
P0 Items (Critical):
- Circuit breaker pattern: Implemented
- False positive tracking: Implemented (rate: ~5% as of v2.4.1)
- P99 latency metrics: Implemented (<200ms verified)
- Cache mode switching: Implemented
- Adversarial bypass detection: Implemented (0/50 bypasses in test suite)
P1 Items (High Priority):
- Shadow-allow mechanism: Configuration-only
- Cache invalidation strategy: TTL-based
- Bloom filter parameters: Configurable
P2 Items (Medium Priority):
- Concurrency model: Single-threaded
- Progressive decoding: Not implemented
- Forensic capabilities: Basic logging
- STRIDE threat model: Partial
Evaluation & Benchmarks
The Phase 2 evaluation pipeline provides self-contained, standard-library-only tools for evaluating AnswerPolicy effectiveness:
- ASR/FPR Metrics: Attack Success Rate and False Positive Rate computation
- Multi-Policy Comparison: Compare baseline, default, kids, and internal_debug policies
- Latency Measurement: Optional per-request latency tracking
- Bootstrap Confidence Intervals: Optional non-parametric CIs for ASR/FPR
- Dataset Validation: Schema compliance, ASCII-only checks, statistics
Quick Start:
python scripts/run_phase2_suite.py --config smoke_test_core
Documentation:
- AnswerPolicy Phase 2 Evaluation (v2.4.1) – Technical Handover – Complete technical documentation
- AnswerPolicy Evaluation User Workflow – User guide
Evaluation Scope & Limitations:
- Current evaluation uses small sample sizes (20-200 items) suitable for local smoke tests
p_correctestimator is uncalibrated (heuristic-based, not probabilistic model)- Datasets use template-based generation, not real-world distributions
- Block attribution is conservative (lower bound for AnswerPolicy contributions)
- Bootstrap CIs are approximate indicators, not publication-grade statistics
For production-grade evaluation with larger datasets and calibrated models, see Future Work in the technical handover document.
System Status
Latest Version: v2.5.0 (2025-12-05)
Kids Policy Performance:
- False Positive Rate: 0.00% (target: ≤5.0%, met in v2.4.1)
- Attack Success Rate: 40.00% (stable)
- Validation Report: VALIDATION_REPORT_v2.4.1.md
Recent Changes:
- v2.4.1: UNSAFE_TOPIC false positive reduction (whitelist filter for benign educational queries)
- UNSAFE_TOPIC false positives: 17 eliminated (100% of identified cases)
- FPR change: 22% → 0.00% (100% elimination on validation dataset), ASR unchanged
References
- Architecture documentation:
docs/SESSION_HANDOVER_2025_12_01.md(v2.4.0rc1) - Technical handover:
docs/TECHNICAL_HANDOVER_2025_12_01.md(pre-v2.4.0rc1) - Test results:
docs/TEST_RESULTS_SUMMARY.md - External review response:
docs/EXTERNAL_REVIEW_RESPONSE.md - PyPI release report:
docs/PYPI_RELEASE_REPORT_2025_12_02.md - AnswerPolicy Phase 2 Evaluation:
docs/ANSWER_POLICY_EVALUATION_PHASE2_2_4_0.md(v2.4.1) - Adaptive Learning Architecture:
docs/ADAPTIVE_SESSION_LEARNING_ARCHITECTURE.md(Design Proposal)
License
MIT License
Copyright (c) 2025 Joerg Bollwahn
Author
Joerg Bollwahn Email: sookoothaii@proton.me
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llm_security_firewall-2.5.0.tar.gz.
File metadata
- Download URL: llm_security_firewall-2.5.0.tar.gz
- Upload date:
- Size: 1.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ea89a6a07536fcaab3df51f309935dafed9128f6892b8979359df28ac4ea0512
|
|
| MD5 |
a24336ad804440a5ef33b1a48e8b4aa1
|
|
| BLAKE2b-256 |
edd83b531dcae8db2b3f1f15f147082bbea487f588e62097e16cc19bf101adf2
|
File details
Details for the file llm_security_firewall-2.5.0-py3-none-any.whl.
File metadata
- Download URL: llm_security_firewall-2.5.0-py3-none-any.whl
- Upload date:
- Size: 528.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6bed66ce50536d87e24962ebfab57d0098b77bdca98b6532a50f617be8f01ef8
|
|
| MD5 |
2223d1cf4a220d8f4888cfe50564ea9b
|
|
| BLAKE2b-256 |
4f1146ad28a08af9f3328feab90f1a044ff6ba68853c110787aa958a15145197
|