Security guards for LLM-powered and agentic AI applications. Zero dependencies. Covers OWASP Top 10 for LLMs 2025.

These details have not been verified by PyPI

Project links

Project description

llm-trust-guard

31 security guards for LLM-powered and agentic AI applications. Zero dependencies. <5ms latency. Covers OWASP Top 10 for LLMs 2025, OWASP Agentic AI 2026, and MCP Security.

Also available as an npm package for TypeScript/JavaScript.

What This Package Does (And What It Doesn't)

"The LLM proposes. The orchestrator disposes."

This package is your first line of defense — like a WAF (Web Application Firewall) for LLM applications. It sits in the orchestration layer and catches known attack patterns before they reach the LLM and after the LLM responds.

What it catches well (~97% on curated benchmarks)

Known prompt injection phrases (170+ patterns, 11 languages)
Encoding bypass attacks (9 formats: Base64, URL, Unicode, Hex, HTML, ROT13, Octal, Base32, mixed)
Policy Puppetry attacks (JSON/INI/XML/YAML-formatted injection) — 100% detection
Role-play/persona attacks (translator trick, academic pretext, emotional manipulation) — 100% detection
PAP/persuasion attacks (authority, urgency, emotional manipulation) — 100% detection
Multilingual injection (10 languages) — 100% detection
Homoglyph attacks (Cyrillic/Greek character substitution) — normalized and detected
PII and secret leakage in outputs
Tool hallucination, RBAC bypass, multi-tenant violations
Tool result poisoning, context window stuffing
MCP tool shadowing, rug pull attacks, SSRF
Malicious agent plugins (OpenClaw backdoor signatures, typosquatting, capability mismatch)
External data validation (source verification, injection scanning, secret detection)
Session integrity (permission escalation, session hijacking, replay attacks)

What it catches partially (~50-80% detection)

Multi-turn escalation (pattern-based, not semantic)
Indirect injection via external data (ExternalDataGuard validates sources)
Encoding bypass with mixed/partial encoding (~86% detection)
Compression-based structural similarity (NCD — catches paraphrased known attacks)

What it cannot catch (<20% detection on real-world datasets)

Semantically paraphrased attacks — regex can't understand meaning. (~10% detection on 1,000 real jailbreaks from CCS'24 dataset)
Adversarial ML attacks (GCG, AutoDAN, JBFuzz) — generated suffixes achieve 93-99% attack success rate.
Novel zero-day prompt techniques — no static filter catches what hasn't been seen before.
Even ML defenses have limits — "The Attacker Moves Second" (OpenAI/Anthropic/DeepMind, Oct 2025) showed all 12 tested defenses bypassed at >90% ASR.

Why architectural guards matter more than detection

Detection has a ceiling. Even with ML, adaptive attackers bypass defenses. That's why this package includes 20+ architectural guards that limit blast radius regardless of whether an attack is detected.

How to close the gap

Use the DetectionClassifier interface to plug in ML-based detection alongside regex:

from llm_trust_guard import TrustGuard, create_regex_classifier
from llm_trust_guard import DetectionClassifier, DetectionContext, DetectionResult

# Your ML classifier
def ml_classifier(input_text: str, ctx: DetectionContext) -> DetectionResult:
    score = your_ml_model.predict(input_text)
    return DetectionResult(safe=score < 0.5, confidence=score, threats=[])

guard = TrustGuard({
    "sanitizer": {"enabled": True},
    "classifier": ml_classifier,
})

# check_async() runs regex + ML classifier
result = await guard.check_async("tool", params, session, user_input=text)

Installation

pip install llm-trust-guard

Quick Start

from llm_trust_guard import InputSanitizer, EncodingDetector, CompressionDetector

# Check for prompt injection
sanitizer = InputSanitizer(threshold=0.3)
result = sanitizer.sanitize(user_input)
if not result.allowed:
    print(f"Blocked: {result.matches}")

# Check for encoding bypass attacks
encoder = EncodingDetector()
result = encoder.detect(user_input)
if not result.allowed:
    print(f"Encoded threat: {result.violations}")

# Check structural similarity to known attacks (NCD)
detector = CompressionDetector()
result = detector.detect(user_input)
if not result.allowed:
    print(f"Similar to: {result.ncd_analysis.closest_category}")

Using TrustGuard Facade (All Guards)

from llm_trust_guard import TrustGuard

guard = TrustGuard({
    "sanitizer": {"enabled": True, "threshold": 0.3},
    "encoding": {"enabled": True},
    "prompt_leakage": {"enabled": True},
    "circuit_breaker": {"enabled": True},
})

# Sync check (regex guards only — <5ms)
result = guard.check("search", {"query": "test"}, session, user_input=text)

# Filter LLM output (PII + prompt leakage detection)
output = guard.filter_output(llm_response, role="user")

# Validate tool results before feeding back to LLM
tool_result = guard.validate_tool_result("search", tool_output)

All 31 Guards

Input Guards (before LLM)

Guard	Purpose	Detection
InputSanitizer	Prompt injection, PAP, Policy Puppetry	170+ regex patterns, 11 languages
EncodingDetector	Encoding bypass (9 formats, multi-layer)	Decode + pattern match
CompressionDetector	Structural similarity to known attacks (NCD)	gzip compression distance, 135 templates
HeuristicAnalyzer	Synonym expansion, structural + statistical analysis	8 attack categories, 130+ synonyms
PromptLeakageGuard	System prompt extraction attempts	Direct + encoded + indirect
ConversationGuard	Multi-turn manipulation, escalation	Session risk scoring
ContextBudgetGuard	Many-shot jailbreaking, context overflow	Token budget tracking
MultiModalGuard	Image/audio metadata injection	Metadata + steganography scan

Access Control Guards

Guard	Purpose	Detection
ToolRegistry	Tool hallucination prevention	Allowlist
PolicyGate	RBAC enforcement	Role hierarchy
TenantBoundary	Multi-tenant isolation	Resource ownership
SchemaValidator	Parameter injection (SQL, NoSQL, XSS, command)	Contextual pattern matching
ExecutionMonitor	Rate limiting, resource quotas	Time-window counting
TokenCostGuard	LLM API cost tracking, financial circuit breaking	Token + dollar budget

Output Guards (after LLM)

Guard	Purpose	Detection
OutputFilter	PII/secret masking	Regex + role-based filtering
OutputSchemaGuard	Structured output validation	Schema + injection scan
ToolResultGuard	Tool return value validation	Injection + state claims

Agentic Guards

Guard	Purpose	Detection
ToolChainValidator	Dangerous tool sequences	Sequence matching
AgentCommunicationGuard	Inter-agent message security	HMAC + nonce
TrustExploitationGuard	Human-agent trust boundary	Action validation
AutonomyEscalationGuard	Unauthorized autonomy expansion	Capability tracking
MemoryGuard	Memory poisoning prevention	Injection patterns + HMAC
StatePersistenceGuard	State corruption prevention	Integrity hashing
CodeExecutionGuard	Unsafe code execution	Static analysis
RAGGuard	RAG document poisoning	Source trust + injection
MCPSecurityGuard	MCP tool shadowing, rug pull, SSRF	Registration + mutation hash
CircuitBreaker	Cascading failure prevention	State machine
DriftDetector	Behavioral anomaly detection	Statistical profiling
ExternalDataGuard	External data validation before LLM context	Source trust + injection + secret scan
AgentSkillGuard	Malicious plugin/tool detection (OpenClaw)	Backdoor signatures + typosquatting
SessionIntegrityGuard	Session hijacking, permission escalation	Binding + sequence + timeout

Pluggable Detection

Component	Purpose
DetectionClassifier	Plug in any ML backend (sync or async) alongside regex guards
create_regex_classifier()	Built-in regex classifier as a DetectionClassifier callback

OWASP Coverage

LLM Top 10 2025

Threat	Guards	Coverage
LLM01: Prompt Injection	InputSanitizer, EncodingDetector, ContextBudgetGuard	Strong (known patterns), Weak (novel semantic)
LLM02: Sensitive Data Exposure	OutputFilter, PromptLeakageGuard	Strong
LLM03: Supply Chain	MCPSecurityGuard	Moderate (MCP-focused)
LLM04: Data Poisoning	RAGGuard, MemoryGuard	Moderate
LLM05: Improper Output Handling	OutputSchemaGuard, OutputFilter	Strong
LLM06: Excessive Agency	AutonomyEscalationGuard, ToolChainValidator	Strong
LLM07: System Prompt Leakage	PromptLeakageGuard	Strong
LLM08: Vector/Embedding Weakness	RAGGuard	Moderate
LLM09: Misinformation	DetectionClassifier (pluggable)	Requires ML backend
LLM10: Unbounded Consumption	ExecutionMonitor, TokenCostGuard	Strong

Agentic AI 2026

Threat	Guards	Coverage
ASI01: Agent Goal Hijack	InputSanitizer, ConversationGuard	Moderate
ASI02: Tool Misuse	ToolChainValidator, ToolRegistry	Strong
ASI03: Privilege Mismanagement	PolicyGate, TenantBoundary	Strong
ASI04: Supply Chain	MCPSecurityGuard	Moderate
ASI05: Code Execution	CodeExecutionGuard	Strong
ASI06: Memory Poisoning	MemoryGuard, StatePersistenceGuard	Strong
ASI07: Inter-Agent Communication	AgentCommunicationGuard	Strong
ASI08: Cascading Failures	CircuitBreaker, DriftDetector	Strong
ASI09: Trust Exploitation	TrustExploitationGuard	Strong
ASI10: Rogue Agents	DriftDetector, AutonomyEscalationGuard	Moderate

Defense In Depth

This package is one layer. For production systems, combine with:

Layer 1: llm-trust-guard (regex pattern matching — fast, zero deps)
Layer 2: ML classifier via DetectionClassifier (semantic detection — slower, more accurate)
Layer 3: Model provider safety (OpenAI moderation, Anthropic safety, etc.)
Layer 4: Human review for high-risk actions
Layer 5: Monitoring + alerting (DriftDetector + circuit breakers)

Framework Integrations

FastAPI/Starlette — TrustGuardMiddleware ASGI middleware
LangChain — TrustGuardLangChain for chain validation
OpenAI — SecureOpenAI or wrap_openai_client() for API wrapping

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.10.1

Apr 25, 2026

0.10.0

Apr 24, 2026

0.9.1

Apr 24, 2026

0.9.0

Apr 23, 2026

0.8.1

Apr 21, 2026

0.8.0

Apr 11, 2026

0.7.1

Apr 6, 2026

0.7.0

Apr 5, 2026

0.6.0

Apr 4, 2026

This version

0.5.2

Apr 3, 2026

0.5.0

Apr 2, 2026

0.4.4

Mar 31, 2026

0.4.3

Mar 26, 2026

0.4.2

Mar 26, 2026

0.4.1

Mar 26, 2026

0.4.0

Mar 26, 2026

0.3.1

Mar 26, 2026

0.3.0

Mar 26, 2026

0.2.2

Mar 26, 2026

0.2.1

Mar 26, 2026

0.2.0

Mar 26, 2026

0.1.0

Mar 25, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_trust_guard-0.5.2.tar.gz (206.8 kB view details)

Uploaded Apr 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llm_trust_guard-0.5.2-py3-none-any.whl (180.1 kB view details)

Uploaded Apr 3, 2026 Python 3

File details

Details for the file llm_trust_guard-0.5.2.tar.gz.

File metadata

Download URL: llm_trust_guard-0.5.2.tar.gz
Upload date: Apr 3, 2026
Size: 206.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for llm_trust_guard-0.5.2.tar.gz
Algorithm	Hash digest
SHA256	`15a451e772ec8f1c55f4b6a673571a4980b161a57dfcb8afdfe40a186960a1a1`
MD5	`5694b3860ae4a560e6baaff34fe4e6be`
BLAKE2b-256	`9c55424834a82388c50ec91b97c300867d82de01dd89a470683b549559f9269c`

See more details on using hashes here.

Provenance

The following attestation bundles were made for llm_trust_guard-0.5.2.tar.gz:

Publisher: release.yml on nkratk/llm-trust-guard-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: llm_trust_guard-0.5.2.tar.gz
- Subject digest: 15a451e772ec8f1c55f4b6a673571a4980b161a57dfcb8afdfe40a186960a1a1
- Sigstore transparency entry: 1222821089
- Sigstore integration time: Apr 3, 2026
Source repository:
- Permalink: nkratk/llm-trust-guard-python@d950c7483623bc951e16986e33e2e75a8caf20cb
- Branch / Tag: refs/tags/v0.5.2
- Owner: https://github.com/nkratk
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@d950c7483623bc951e16986e33e2e75a8caf20cb
- Trigger Event: release

File details

Details for the file llm_trust_guard-0.5.2-py3-none-any.whl.

File metadata

Download URL: llm_trust_guard-0.5.2-py3-none-any.whl
Upload date: Apr 3, 2026
Size: 180.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for llm_trust_guard-0.5.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e254df9e669c40b1a37fac6a38c8fb80c3bdc8fdf3f935f19ef89f2d507fd63d`
MD5	`63703cca9b32cdbd24238f8809845443`
BLAKE2b-256	`b3b0e867133d7bbfb8c7c0f24ca815b612b9a3b3942a31eb687b84d641eb9638`

See more details on using hashes here.

Provenance

The following attestation bundles were made for llm_trust_guard-0.5.2-py3-none-any.whl:

Publisher: release.yml on nkratk/llm-trust-guard-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: llm_trust_guard-0.5.2-py3-none-any.whl
- Subject digest: e254df9e669c40b1a37fac6a38c8fb80c3bdc8fdf3f935f19ef89f2d507fd63d
- Sigstore transparency entry: 1222821213
- Sigstore integration time: Apr 3, 2026
Source repository:
- Permalink: nkratk/llm-trust-guard-python@d950c7483623bc951e16986e33e2e75a8caf20cb
- Branch / Tag: refs/tags/v0.5.2
- Owner: https://github.com/nkratk
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@d950c7483623bc951e16986e33e2e75a8caf20cb
- Trigger Event: release

llm-trust-guard 0.5.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

llm-trust-guard

What This Package Does (And What It Doesn't)

What it catches well (~97% on curated benchmarks)

What it catches partially (~50-80% detection)

What it cannot catch (<20% detection on real-world datasets)

Why architectural guards matter more than detection

How to close the gap

Installation

Quick Start

Using TrustGuard Facade (All Guards)

All 31 Guards

Input Guards (before LLM)

Access Control Guards

Output Guards (after LLM)

Agentic Guards

Pluggable Detection

OWASP Coverage

LLM Top 10 2025

Agentic AI 2026

Defense In Depth

Framework Integrations

Links

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance