Security guards for LLM-powered and agentic AI applications. Zero dependencies. Covers OWASP Top 10 for LLMs 2025.
Project description
llm-trust-guard
31 security guards for LLM-powered and agentic AI applications. Zero dependencies. <5ms latency. Covers OWASP Top 10 for LLMs 2025, OWASP Agentic AI 2026, and MCP Security.
Also available as an npm package for TypeScript/JavaScript.
What This Package Does (And What It Doesn't)
"The LLM proposes. The orchestrator disposes."
This package is your first line of defense — like a WAF (Web Application Firewall) for LLM applications. It sits in the orchestration layer and catches known attack patterns before they reach the LLM and after the LLM responds.
What it catches well (~97% on curated benchmarks)
- Known prompt injection phrases (170+ patterns, 11 languages)
- Encoding bypass attacks (9 formats: Base64, URL, Unicode, Hex, HTML, ROT13, Octal, Base32, mixed)
- Policy Puppetry attacks (JSON/INI/XML/YAML-formatted injection) — 100% detection
- Role-play/persona attacks (translator trick, academic pretext, emotional manipulation) — 100% detection
- PAP/persuasion attacks (authority, urgency, emotional manipulation) — 100% detection
- Multilingual injection (10 languages) — 100% detection
- Homoglyph attacks (Cyrillic/Greek character substitution) — normalized and detected
- PII and secret leakage in outputs
- Tool hallucination, RBAC bypass, multi-tenant violations
- Tool result poisoning, context window stuffing
- MCP tool shadowing, rug pull attacks, SSRF
- Malicious agent plugins (OpenClaw backdoor signatures, typosquatting, capability mismatch)
- External data validation (source verification, injection scanning, secret detection)
- Session integrity (permission escalation, session hijacking, replay attacks)
What it catches partially (~50-80% detection)
- Multi-turn escalation (pattern-based, not semantic)
- Indirect injection via external data (ExternalDataGuard validates sources)
- Encoding bypass with mixed/partial encoding (~86% detection)
- Compression-based structural similarity (NCD — catches paraphrased known attacks)
What it cannot catch (<20% detection on real-world datasets)
- Semantically paraphrased attacks — regex can't understand meaning. (~10% detection on 1,000 real jailbreaks from CCS'24 dataset)
- Adversarial ML attacks (GCG, AutoDAN, JBFuzz) — generated suffixes achieve 93-99% attack success rate.
- Novel zero-day prompt techniques — no static filter catches what hasn't been seen before.
- Even ML defenses have limits — "The Attacker Moves Second" (OpenAI/Anthropic/DeepMind, Oct 2025) showed all 12 tested defenses bypassed at >90% ASR.
Why architectural guards matter more than detection
Detection has a ceiling. Even with ML, adaptive attackers bypass defenses. That's why this package includes 20+ architectural guards that limit blast radius regardless of whether an attack is detected.
How to close the gap
Use the DetectionClassifier interface to plug in ML-based detection alongside regex:
from llm_trust_guard import TrustGuard, create_regex_classifier
from llm_trust_guard import DetectionClassifier, DetectionContext, DetectionResult
# Your ML classifier
def ml_classifier(input_text: str, ctx: DetectionContext) -> DetectionResult:
score = your_ml_model.predict(input_text)
return DetectionResult(safe=score < 0.5, confidence=score, threats=[])
guard = TrustGuard({
"sanitizer": {"enabled": True},
"classifier": ml_classifier,
})
# check_async() runs regex + ML classifier
result = await guard.check_async("tool", params, session, user_input=text)
Installation
pip install llm-trust-guard
Quick Start
from llm_trust_guard import InputSanitizer, EncodingDetector, CompressionDetector
# Check for prompt injection
sanitizer = InputSanitizer(threshold=0.3)
result = sanitizer.sanitize(user_input)
if not result.allowed:
print(f"Blocked: {result.matches}")
# Check for encoding bypass attacks
encoder = EncodingDetector()
result = encoder.detect(user_input)
if not result.allowed:
print(f"Encoded threat: {result.violations}")
# Check structural similarity to known attacks (NCD)
detector = CompressionDetector()
result = detector.detect(user_input)
if not result.allowed:
print(f"Similar to: {result.ncd_analysis.closest_category}")
Using TrustGuard Facade (All Guards)
from llm_trust_guard import TrustGuard
guard = TrustGuard({
"sanitizer": {"enabled": True, "threshold": 0.3},
"encoding": {"enabled": True},
"prompt_leakage": {"enabled": True},
"circuit_breaker": {"enabled": True},
})
# Sync check (regex guards only — <5ms)
result = guard.check("search", {"query": "test"}, session, user_input=text)
# Filter LLM output (PII + prompt leakage detection)
output = guard.filter_output(llm_response, role="user")
# Validate tool results before feeding back to LLM
tool_result = guard.validate_tool_result("search", tool_output)
All 31 Guards
Input Guards (before LLM)
| Guard | Purpose | Detection |
|---|---|---|
| InputSanitizer | Prompt injection, PAP, Policy Puppetry | 170+ regex patterns, 11 languages |
| EncodingDetector | Encoding bypass (9 formats, multi-layer) | Decode + pattern match |
| CompressionDetector | Structural similarity to known attacks (NCD) | gzip compression distance, 135 templates |
| HeuristicAnalyzer | Synonym expansion, structural + statistical analysis | 8 attack categories, 130+ synonyms |
| PromptLeakageGuard | System prompt extraction attempts | Direct + encoded + indirect |
| ConversationGuard | Multi-turn manipulation, escalation | Session risk scoring |
| ContextBudgetGuard | Many-shot jailbreaking, context overflow | Token budget tracking |
| MultiModalGuard | Image/audio metadata injection | Metadata + steganography scan |
Access Control Guards
| Guard | Purpose | Detection |
|---|---|---|
| ToolRegistry | Tool hallucination prevention | Allowlist |
| PolicyGate | RBAC enforcement | Role hierarchy |
| TenantBoundary | Multi-tenant isolation | Resource ownership |
| SchemaValidator | Parameter injection (SQL, NoSQL, XSS, command) | Contextual pattern matching |
| ExecutionMonitor | Rate limiting, resource quotas | Time-window counting |
| TokenCostGuard | LLM API cost tracking, financial circuit breaking | Token + dollar budget |
Output Guards (after LLM)
| Guard | Purpose | Detection |
|---|---|---|
| OutputFilter | PII/secret masking | Regex + role-based filtering |
| OutputSchemaGuard | Structured output validation | Schema + injection scan |
| ToolResultGuard | Tool return value validation | Injection + state claims |
Agentic Guards
| Guard | Purpose | Detection |
|---|---|---|
| ToolChainValidator | Dangerous tool sequences | Sequence matching |
| AgentCommunicationGuard | Inter-agent message security | HMAC + nonce |
| TrustExploitationGuard | Human-agent trust boundary | Action validation |
| AutonomyEscalationGuard | Unauthorized autonomy expansion | Capability tracking |
| MemoryGuard | Memory poisoning prevention | Injection patterns + HMAC |
| StatePersistenceGuard | State corruption prevention | Integrity hashing |
| CodeExecutionGuard | Unsafe code execution | Static analysis |
| RAGGuard | RAG document poisoning | Source trust + injection |
| MCPSecurityGuard | MCP tool shadowing, rug pull, SSRF | Registration + mutation hash |
| CircuitBreaker | Cascading failure prevention | State machine |
| DriftDetector | Behavioral anomaly detection | Statistical profiling |
| ExternalDataGuard | External data validation before LLM context | Source trust + injection + secret scan |
| AgentSkillGuard | Malicious plugin/tool detection (OpenClaw) | Backdoor signatures + typosquatting |
| SessionIntegrityGuard | Session hijacking, permission escalation | Binding + sequence + timeout |
Pluggable Detection
| Component | Purpose |
|---|---|
| DetectionClassifier | Plug in any ML backend (sync or async) alongside regex guards |
| create_regex_classifier() | Built-in regex classifier as a DetectionClassifier callback |
OWASP Coverage
LLM Top 10 2025
| Threat | Guards | Coverage |
|---|---|---|
| LLM01: Prompt Injection | InputSanitizer, EncodingDetector, ContextBudgetGuard | Strong (known patterns), Weak (novel semantic) |
| LLM02: Sensitive Data Exposure | OutputFilter, PromptLeakageGuard | Strong |
| LLM03: Supply Chain | MCPSecurityGuard | Moderate (MCP-focused) |
| LLM04: Data Poisoning | RAGGuard, MemoryGuard | Moderate |
| LLM05: Improper Output Handling | OutputSchemaGuard, OutputFilter | Strong |
| LLM06: Excessive Agency | AutonomyEscalationGuard, ToolChainValidator | Strong |
| LLM07: System Prompt Leakage | PromptLeakageGuard | Strong |
| LLM08: Vector/Embedding Weakness | RAGGuard | Moderate |
| LLM09: Misinformation | DetectionClassifier (pluggable) | Requires ML backend |
| LLM10: Unbounded Consumption | ExecutionMonitor, TokenCostGuard | Strong |
Agentic AI 2026
| Threat | Guards | Coverage |
|---|---|---|
| ASI01: Agent Goal Hijack | InputSanitizer, ConversationGuard | Moderate |
| ASI02: Tool Misuse | ToolChainValidator, ToolRegistry | Strong |
| ASI03: Privilege Mismanagement | PolicyGate, TenantBoundary | Strong |
| ASI04: Supply Chain | MCPSecurityGuard | Moderate |
| ASI05: Code Execution | CodeExecutionGuard | Strong |
| ASI06: Memory Poisoning | MemoryGuard, StatePersistenceGuard | Strong |
| ASI07: Inter-Agent Communication | AgentCommunicationGuard | Strong |
| ASI08: Cascading Failures | CircuitBreaker, DriftDetector | Strong |
| ASI09: Trust Exploitation | TrustExploitationGuard | Strong |
| ASI10: Rogue Agents | DriftDetector, AutonomyEscalationGuard | Moderate |
Defense In Depth
This package is one layer. For production systems, combine with:
Layer 1: llm-trust-guard (regex pattern matching — fast, zero deps)
Layer 2: ML classifier via DetectionClassifier (semantic detection — slower, more accurate)
Layer 3: Model provider safety (OpenAI moderation, Anthropic safety, etc.)
Layer 4: Human review for high-risk actions
Layer 5: Monitoring + alerting (DriftDetector + circuit breakers)
Framework Integrations
- FastAPI/Starlette —
TrustGuardMiddlewareASGI middleware - LangChain —
TrustGuardLangChainfor chain validation - OpenAI —
SecureOpenAIorwrap_openai_client()for API wrapping
Links
- npm package (TypeScript — 31 guards)
- OWASP Top 10 for LLMs 2025
- OWASP Top 10 for Agentic Applications 2026
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llm_trust_guard-0.5.2.tar.gz.
File metadata
- Download URL: llm_trust_guard-0.5.2.tar.gz
- Upload date:
- Size: 206.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
15a451e772ec8f1c55f4b6a673571a4980b161a57dfcb8afdfe40a186960a1a1
|
|
| MD5 |
5694b3860ae4a560e6baaff34fe4e6be
|
|
| BLAKE2b-256 |
9c55424834a82388c50ec91b97c300867d82de01dd89a470683b549559f9269c
|
Provenance
The following attestation bundles were made for llm_trust_guard-0.5.2.tar.gz:
Publisher:
release.yml on nkratk/llm-trust-guard-python
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
llm_trust_guard-0.5.2.tar.gz -
Subject digest:
15a451e772ec8f1c55f4b6a673571a4980b161a57dfcb8afdfe40a186960a1a1 - Sigstore transparency entry: 1222821089
- Sigstore integration time:
-
Permalink:
nkratk/llm-trust-guard-python@d950c7483623bc951e16986e33e2e75a8caf20cb -
Branch / Tag:
refs/tags/v0.5.2 - Owner: https://github.com/nkratk
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@d950c7483623bc951e16986e33e2e75a8caf20cb -
Trigger Event:
release
-
Statement type:
File details
Details for the file llm_trust_guard-0.5.2-py3-none-any.whl.
File metadata
- Download URL: llm_trust_guard-0.5.2-py3-none-any.whl
- Upload date:
- Size: 180.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e254df9e669c40b1a37fac6a38c8fb80c3bdc8fdf3f935f19ef89f2d507fd63d
|
|
| MD5 |
63703cca9b32cdbd24238f8809845443
|
|
| BLAKE2b-256 |
b3b0e867133d7bbfb8c7c0f24ca815b612b9a3b3942a31eb687b84d641eb9638
|
Provenance
The following attestation bundles were made for llm_trust_guard-0.5.2-py3-none-any.whl:
Publisher:
release.yml on nkratk/llm-trust-guard-python
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
llm_trust_guard-0.5.2-py3-none-any.whl -
Subject digest:
e254df9e669c40b1a37fac6a38c8fb80c3bdc8fdf3f935f19ef89f2d507fd63d - Sigstore transparency entry: 1222821213
- Sigstore integration time:
-
Permalink:
nkratk/llm-trust-guard-python@d950c7483623bc951e16986e33e2e75a8caf20cb -
Branch / Tag:
refs/tags/v0.5.2 - Owner: https://github.com/nkratk
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@d950c7483623bc951e16986e33e2e75a8caf20cb -
Trigger Event:
release
-
Statement type: