Skip to main content

Cognitive Security Middleware - The 'Electronic Stability Program' (ESP) for Large Language Models. Bidirectional containment system with defense-in-depth architecture (6 validation layers), stateful tracking, and mathematical safety constraints. Validated against Unicode/encoding attacks, pattern evasion, multilingual/polyglot attacks (12+ languages including Basque, Maltese), and memory/session attacks. Protocol-based hexagonal architecture with LangChain integration.

Project description

LLM Security Firewall

Bidirectional security framework for human/LLM interfaces implementing defense-in-depth architecture with multiple validation layers.

Version: 2.5.0 Python: >=3.12 License: MIT Status: Production

TL;DR

A bidirectional security layer for LLM-based systems. Validates input, output, and agent state transitions.

  • Input, output, memory, and state validation
  • Supports agent frameworks, tool-using models, and API gateways
  • Detection: prompt injection, jailbreaks, obfuscation, evasive encodings, unauthorized tool actions, cognitive leak risks, children-safety failures, malformed JSON, memory-poisoning
  • Local execution: no telemetry, no external calls
  • API: guard.check_input(text) / guard.check_output(text)

Minimal Example

from llm_firewall import guard

# Example: Input validation. The v2.4.1 update reduced false positives for
# benign educational queries matching the 'explain how...' pattern.
user_prompt = "Explain how rain forms."

decision = guard.check_input(user_prompt)
print(f"Blocked: {not decision.allowed}, Reason: {decision.reason}")

if decision.allowed:
    # LLM backend call
    response = llm(user_prompt)
    out = guard.check_output(response)
    if not out.allowed:
        print(f"Response sanitized: {out.reason}")
    else:
        print(f"LLM Output: {out.cleaned_text}")

Installation

Core Installation (Recommended - ~54 MB baseline)

Lightweight baseline with ONNX-only inference:

pip install llm-security-firewall
# OR
pip install -r requirements-core.txt

This provides:

  • Pattern matching and basic validation
  • ONNX-based semantic guard (CUDA-enabled)
  • Memory footprint: ~54 MB (96% reduction from original 1.3 GB)

Full ML Capabilities (Optional - ~1.3 GB when loaded)

For advanced validators (TruthPreservationValidator, TopicFence):

pip install llm-security-firewall[full]
# OR
pip install -r requirements.txt

Heavy components (PyTorch, transformers) are loaded on-demand only - they don't affect the baseline.

Development Installation

For development (local installation):

pip install -e .

For development dependencies:

pip install -e .[dev]

Optional extras:

pip install llm-security-firewall[langchain]  # LangChain integration
pip install llm-security-firewall[dev]       # Development tools
pip install llm-security-firewall[monitoring] # Monitoring tools

Features

  • Bidirectional validation: Input, output, and memory integrity validation
  • Sequential validation layers: UnicodeSanitizer, NormalizationLayer, RegexGate, Input Analysis, Tool Inspection, Output Validation
  • Statistical methods: CUSUM for drift detection, Dempster-Shafer theory for evidence fusion, fail-closed risk gating
  • Multilingual detection: Polyglot attack detection across 12+ languages including low-resource languages (Basque, Maltese tested)
  • Unicode normalization: Zero-width character removal, bidirectional override detection, homoglyph normalization, encoding anomaly detection
  • Session state tracking: Session state management, drift detection, cumulative risk tracking
  • Tool call validation: HEPHAESTUS protocol for tool call validation and killchain detection
  • Published metrics: False Positive Rate (FPR), P99 latency, and memory usage documented in /docs/
  • Hexagonal architecture: Protocol-based adapters for framework independence

Architecture

Architectural Approach

The system implements a stateful, bidirectional containment mechanism for large language models. Requests are processed through sequential validation layers with mathematical constraints and stateful tracking.

Architectural principles:

  1. Bidirectional validation: All data paths validated (input, output, in-memory state transitions)
  2. Hexagonal architecture: Protocol-based Port/Adapter interfaces with dependency injection
  3. Domain separation: Core business logic separated from infrastructure concerns
  4. Framework independence: Domain layer uses Protocol-based adapters (DecisionCachePort, DecoderPort, ValidatorPort)
  5. Deterministic normalization: Multi-pass Unicode normalization, homoglyph resolution, Base64/Hex/URL decoding, JSON-hardening
  6. Statistical methods: CUSUM detectors for oscillation detection, Dempster-Shafer uncertainty modeling for evidence fusion, fail-closed risk gating
  7. Stateful protection: Policies operate on text, agent state, tool sequences, and memory mutation events
  8. Local execution: No telemetry, no external API calls, no data exfiltration

Bidirectional Processing Pipeline

The system operates in three directions:

  1. Human → LLM (Input Protection)

    • Normalization and sanitization
    • Pattern matching and evasion detection
    • Risk scoring and policy evaluation
    • Session state tracking
  2. LLM → Human (Output Protection)

    • Evidence validation
    • Tool call validation
    • Output sanitization
    • Truth preservation checks
  3. Memory Integrity

    • Session state management
    • Drift detection
    • Influence tracking

Core Components

Firewall Engine (src/llm_firewall/core/firewall_engine_v2.py)

  • Main decision engine
  • Risk score aggregation
  • Policy application
  • Unicode security analysis

Normalization Layer (src/hak_gal/layers/inbound/normalization_layer.py)

  • Recursive URL/percent decoding
  • Unicode normalization (NFKC)
  • Zero-width character removal
  • Directional override character removal

Pattern Matching (src/llm_firewall/rules/patterns.py)

  • Regex-based pattern detection
  • Concatenation-aware matching
  • Evasion pattern detection

Risk Scoring (src/llm_firewall/core/risk_scorer.py)

  • Multi-factor risk calculation
  • Cumulative risk tracking
  • Threshold-based decisions

Cache System (src/llm_firewall/cache/decision_cache.py)

  • Exact match caching (Redis)
  • Semantic caching (LangCache)
  • Hybrid mode support
  • Circuit breaker pattern
  • Fail-safe behavior (blocks on cache failure, prevents security bypass)

Adapter Health (src/llm_firewall/core/adapter_health.py)

  • Circuit breaker implementation
  • Health metrics tracking
  • Failure threshold management
  • Recovery timeout handling

Developer Adoption API (src/llm_firewall/guard.py)

  • API: guard.check_input(text), guard.check_output(text)
  • Backward compatible with existing API
  • Integration guide: QUICKSTART.md

LangChain Integration (src/llm_firewall/integrations/langchain/callbacks.py)

  • FirewallCallbackHandler for LangChain chains
  • Automatic input/output validation
  • See examples/langchain_integration.py for usage

Configuration

Cache Modes

Configure via CACHE_MODE environment variable:

  • exact (default): Redis exact match cache
  • semantic: LangCache semantic search
  • hybrid: Both caches in sequence

Redis Configuration

export REDIS_URL=redis://:password@host:6379/0
export REDIS_TTL=3600  # Optional: Cache TTL in seconds

For Redis Cloud:

export REDIS_CLOUD_HOST=host
export REDIS_CLOUD_PORT=port
export REDIS_CLOUD_USERNAME=username
export REDIS_CLOUD_PASSWORD=password

Examples

Integration examples in examples/ directory:

  • quickstart.py - Basic integration using guard.py API
  • langchain_integration.py - LangChain integration with FirewallCallbackHandler
  • minimal_fastapi.py - FastAPI middleware integration
  • quickstart_fastapi.py - FastAPI example with input/output validation

Run examples:

python examples/quickstart.py
python examples/langchain_integration.py
python examples/minimal_fastapi.py

Basic usage:

from llm_firewall import guard

decision = guard.check_input("user input text")
if decision.allowed:
    # Process request
    pass

Testing

Test suite includes unit tests, integration tests, and adversarial test cases.

pytest tests/ -v

With coverage:

pytest tests/ -v --cov=src/llm_firewall --cov-report=term

Dependencies

Core (Required for basic functionality):

  • numpy>=1.24.0
  • scipy>=1.11.0
  • scikit-learn>=1.3.0
  • pyyaml>=6.0
  • blake3>=0.3.0
  • requests>=2.31.0
  • psycopg[binary]>=3.1.0
  • redis>=5.0.0
  • pydantic>=2.0.0
  • psutil>=5.9.0
  • cryptography>=41.0.0

Machine Learning (Optional, for advanced features):

  • sentence-transformers>=2.2.0 (SemanticVectorCheck, embedding-based detection)
  • torch>=2.0.0 (ML model inference)
  • transformers>=4.30.0 (Transformer-based detectors)
  • onnx>=1.14.0 (ONNX model support)
  • onnxruntime>=1.16.0 (ONNX runtime)

Note: Core functionality (Unicode normalization, pattern matching, risk scoring, basic validation) operates without ML dependencies. Semantic similarity detection and Kids Policy Engine require optional ML dependencies.

System Requirements:

  • Python >=3.12 (by design, no legacy support)
  • RAM: ~300MB for core functionality, ~1.3GB for adversarial inputs with full ML features
  • GPU: Optional, only required for certain ML-based detectors
  • Redis: Optional but recommended for caching (local or cloud)

Known Limitations

  1. False Positive Rate: Kids Policy false positive rate is 0.00% on validation dataset (target: ≤5.0%, met in v2.4.1)
  2. Memory Usage: Current memory usage exceeds 300MB cap for adversarial inputs (measured: ~1.3GB)
  3. Unicode Normalization: Some edge cases in mathematical alphanumeric symbol handling
  4. Python Version: Requires Python >=3.12 (by design, no legacy support for 3.10/3.11)
  5. Dependencies: Core functionality requires numpy, scipy, scikit-learn; full ML features require torch, transformers, sentence-transformers (see Dependencies section)

Security Notice

This library reduces risk but does not guarantee complete protection.

Required additional security controls:

  • Authentication and authorization
  • Network isolation
  • Logging and monitoring
  • Rate limiting
  • Sandboxing of tool environments

The maintainers assume no liability for misuse.

Use only in compliance with local law and data-protection regulations.

Security Hardening

Implemented Measures

  1. Multi-Tenant Isolation

    • Session hashing via HMAC-SHA256(tenant_id + user_id + DAILY_SALT)
    • Redis key isolation via ACLs and prefixes
  2. Oscillation Defense

    • CUSUM (Cumulative Sum Control Chart) algorithm
    • Accumulative risk tracking across session turns
  3. Parser Differential Protection

    • StrictJSONDecoder with duplicate key detection
    • Immediate exception on key duplication
  4. Unicode Security

    • Zero-width character detection and removal
    • Directional override character detection
    • Homoglyph normalization
  5. Multilingual Attack Detection

    • Polyglot attack detection across 12+ languages
    • Low-resource language hardening (Basque, Maltese tested)
    • Language switching detection
    • Multilingual keyword detection (Chinese, Japanese, Russian, Arabic, Hindi, Korean, and others)
  6. Pattern Evasion Detection

    • Concatenation-aware pattern matching
    • Encoding anomaly detection
    • Obfuscation pattern recognition

Performance Characteristics

  • P99 Latency: <200ms for standard inputs (measured)
  • Cache Hit Rate: 30-50% (exact), 70-90% (hybrid)
  • Cache Latency: <100ms (Redis Cloud), <1ms (local Redis)

Monitoring

MCP monitoring tools available for health checks and metrics:

  • firewall_health_check: Redis/Session health inspection
  • firewall_deployment_status: Traffic percentage and rollout phase
  • firewall_metrics: Real-time block rates and CUSUM scores
  • firewall_check_alerts: Critical P0 alerts
  • firewall_redis_status: ACL and connection pool health

Implementation Status

P0 Items (Critical):

  • Circuit breaker pattern: Implemented
  • False positive tracking: Implemented (rate: ~5% as of v2.4.1)
  • P99 latency metrics: Implemented (<200ms verified)
  • Cache mode switching: Implemented
  • Adversarial bypass detection: Implemented (0/50 bypasses in test suite)

P1 Items (High Priority):

  • Shadow-allow mechanism: Configuration-only
  • Cache invalidation strategy: TTL-based
  • Bloom filter parameters: Configurable

P2 Items (Medium Priority):

  • Concurrency model: Single-threaded
  • Progressive decoding: Not implemented
  • Forensic capabilities: Basic logging
  • STRIDE threat model: Partial

Evaluation & Benchmarks

The Phase 2 evaluation pipeline provides self-contained, standard-library-only tools for evaluating AnswerPolicy effectiveness:

  • ASR/FPR Metrics: Attack Success Rate and False Positive Rate computation
  • Multi-Policy Comparison: Compare baseline, default, kids, and internal_debug policies
  • Latency Measurement: Optional per-request latency tracking
  • Bootstrap Confidence Intervals: Optional non-parametric CIs for ASR/FPR
  • Dataset Validation: Schema compliance, ASCII-only checks, statistics

Quick Start:

python scripts/run_phase2_suite.py --config smoke_test_core

Documentation:

Evaluation Scope & Limitations:

  • Current evaluation uses small sample sizes (20-200 items) suitable for local smoke tests
  • p_correct estimator is uncalibrated (heuristic-based, not probabilistic model)
  • Datasets use template-based generation, not real-world distributions
  • Block attribution is conservative (lower bound for AnswerPolicy contributions)
  • Bootstrap CIs are approximate indicators, not publication-grade statistics

For production-grade evaluation with larger datasets and calibrated models, see Future Work in the technical handover document.

System Status

Latest Version: v2.5.0 (2025-12-05)

Kids Policy Performance:

  • False Positive Rate: 0.00% (target: ≤5.0%, met in v2.4.1)
  • Attack Success Rate: 40.00% (stable)
  • Validation Report: VALIDATION_REPORT_v2.4.1.md

Recent Changes:

  • v2.4.1: UNSAFE_TOPIC false positive reduction (whitelist filter for benign educational queries)
  • UNSAFE_TOPIC false positives: 17 eliminated (100% of identified cases)
  • FPR change: 22% → 0.00% (100% elimination on validation dataset), ASR unchanged

References

  • Architecture documentation: docs/SESSION_HANDOVER_2025_12_01.md (v2.4.0rc1)
  • Technical handover: docs/TECHNICAL_HANDOVER_2025_12_01.md (pre-v2.4.0rc1)
  • Test results: docs/TEST_RESULTS_SUMMARY.md
  • External review response: docs/EXTERNAL_REVIEW_RESPONSE.md
  • PyPI release report: docs/PYPI_RELEASE_REPORT_2025_12_02.md
  • AnswerPolicy Phase 2 Evaluation: docs/ANSWER_POLICY_EVALUATION_PHASE2_2_4_0.md (v2.4.1)
  • Adaptive Learning Architecture: docs/ADAPTIVE_SESSION_LEARNING_ARCHITECTURE.md (Design Proposal)

License

MIT License

Copyright (c) 2025 Joerg Bollwahn

Author

Joerg Bollwahn Email: sookoothaii@proton.me

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_security_firewall-2.5.0.tar.gz (1.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_security_firewall-2.5.0-py3-none-any.whl (528.5 kB view details)

Uploaded Python 3

File details

Details for the file llm_security_firewall-2.5.0.tar.gz.

File metadata

  • Download URL: llm_security_firewall-2.5.0.tar.gz
  • Upload date:
  • Size: 1.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for llm_security_firewall-2.5.0.tar.gz
Algorithm Hash digest
SHA256 ea89a6a07536fcaab3df51f309935dafed9128f6892b8979359df28ac4ea0512
MD5 a24336ad804440a5ef33b1a48e8b4aa1
BLAKE2b-256 edd83b531dcae8db2b3f1f15f147082bbea487f588e62097e16cc19bf101adf2

See more details on using hashes here.

File details

Details for the file llm_security_firewall-2.5.0-py3-none-any.whl.

File metadata

File hashes

Hashes for llm_security_firewall-2.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6bed66ce50536d87e24962ebfab57d0098b77bdca98b6532a50f617be8f01ef8
MD5 2223d1cf4a220d8f4888cfe50564ea9b
BLAKE2b-256 4f1146ad28a08af9f3328feab90f1a044ff6ba68853c110787aa958a15145197

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page