Cost ceiling, audit log, and kill switch for LLM agents.

These details have not been verified by PyPI

Project links

Project description

llm-leash

Stop your LLM agent from burning money, leaking data, or breaking production — without locking you into a framework.

🌐 Read this in another language: English · 中文 · Español · 日本語 · Português (BR) · Русский

llm-leash is a runtime firewall for LLM agents. It owns the boring, high-consequence half of agent safety: money, paperwork, panic button.

The five ways your agent kills you

Every team that ships an LLM agent eventually meets one of these. We've all seen the post-mortem.

1. The runaway bill — "$2,387 in 14 minutes"

A retry loop. A tool that doesn't return what the agent expects. A user prompt that nudges it into infinite reasoning. The agent burns through your API quota in the time it takes you to read a Slack message.

How we stop it: Hard USD cap per session, enforced before the call. Cumulative cost is tracked across every model call; the agent can't even issue request #N+1 once the cap is hit. A soft cap warns earlier so you can investigate before it bites. The proxy also cancels mid-stream if a single SSE response would blow the cap halfway through.

2. The leak — "agent read `.env` and POSTed it to `attacker.com`"

Two flavors:

Direct: a tool returns os.environ or cat .env and the LLM faithfully includes the contents in its next message.
Indirect: the agent reads an attacker-controlled page or file containing  and the LLM follows the new orders. This is the #1 unaddressed attack vector for agentic systems in 2026 (OWASP LLM01).

How we stop it:

SecretsRule scans every outgoing argument for AWS keys, GitHub PATs, Stripe keys, JWTs, SSH keys, generic high-entropy blobs.
ArtifactLeakageRule catches developer-host paths (/Users/<you>/..., .env, .git/config, .aws/credentials) before they reach the LLM or get committed.
ToolResultScanner — the indirect-injection breaker. Scans content coming back from tools (file contents, web pages, DB rows) for hidden instructions, role-confusion tags, exfil phrases, unicode obfuscation, and base64 blobs.
ExfilChainDetector correlates calls across the whole session and flags the classic three-step chain: read sensitive → encode → POST external.

Action is configurable per rule: block, redact, hitl (pause for human review), or warn.

3. The destructive call — "agent decided to `DROP TABLE users`"

A vague user message ("clean up the staging DB"). A misclassified intent. A misunderstanding of which environment "production" refers to. The agent issues an irreversible command and your incident channel lights up.

How we stop it:

BlockedSql parses SQL and rejects DROP, TRUNCATE, DELETE WITHOUT WHERE, GRANT, etc.
BlockedShell rejects rm -rf, dd, mkfs, fork bombs.
BlockedPatterns lets you add custom regex for your own destructive verbs.
HitlThreshold + Human review queue — for tools that can legitimately do destructive work (DB migrations, payments, mass email), pause and require a human in the operator console to click ✓ approve before the call goes through.
Kill switch — when something is in flight that shouldn't be, one operator click on the console (or one CLI command, or one HTTP call, or one Redis SET) stops every subsequent call from that session with sub-300 ms propagation.

4. The audit nightmare — "show me every action this agent took for customer X last month"

Compliance calls. SOC 2 evidence. EU AI Act Article 12. Your own post-mortem the day after the incident. Without a tamper-evident log that ties model calls to sessions, tenants, tools, costs, and policy decisions, you can't answer the question — and that's now a regulatory problem.

How we stop it:

Every model call, every policy decision, every tool invocation, every kill event, every human-review decision is appended as one JSONL line to an append-only, hash-chained audit log. Tamper-evident: llm-leash verify audit.jsonl re-checks the chain.
Optional HMAC signing for off-host shipping.
llm-leash soc2 generates a complete SOC 2 evidence pack (executive summary, CC6 access control matrix, CC7 monitoring data, anomalies CSV, bill of materials) in one command.

5. The silent drift — "the regex stopped working three weeks ago"

Anthropic ships a new model. The response format changes just enough that your LocalLLMGuardRule starts missing 20% of jailbreaks. Or an attacker reads your open-source repo, iterates against your regex until they find one that bypasses it. By the time you notice from an incident, weeks of attacks have slipped through.

How we stop it:

Continuous eval pipeline — runs your rules against a labeled dataset (292 cases bundled; bring your own) on a cron / k8s CronJob and writes precision / recall / F1 per rule per run to a JSONL log.
Drift detection — current F1 is compared against the 7-day baseline; if it dropped more than 5 percentage points, an audit event fires and the operator console shows a red 🚨 DRIFT marker on the affected rule.
Operator feedback loop — every human-review approve/reject is logged with the rule that fired, so the console can compute per-rule false-positive rate and recommend tuning before operators start ignoring noisy rules.

Quickstart — in-process

5 lines. Wrap your existing LLM client.

from llm_leash import Firewall, LeashKilled
from anthropic import Anthropic

fw = Firewall(budget_usd=10.00, audit_log="audit.jsonl")
client = fw.wrap(Anthropic())

try:
    while True:
        client.messages.create(model="claude-opus-4-7", max_tokens=200,
                               messages=[{"role": "user", "content": "..."}])
except LeashKilled as e:
    print(f"Saved you the rest. Reason: {e.reason}")

Try the offline demo (no API key needed):

python demo.py
llm-leash verify audit.jsonl

Same wrapper works with Anthropic, OpenAI, LangGraph, CrewAI, OpenHands, Pydantic-AI, MCP. Full list and per-adapter examples in API.md.

Quickstart — HTTP proxy

For agents you can't (or don't want to) modify — change one env var, get the firewall:

pip install "llm-leash[proxy]"
llm-leash-proxy --listen 127.0.0.1:8000 --audit-log audit.jsonl \
                --budget-usd 50

# Point any agent at it
export ANTHROPIC_BASE_URL=http://localhost:8000
export OPENAI_BASE_URL=http://localhost:8000
python my_agent.py

Works with any client speaking the OpenAI / Anthropic on-wire protocol (OpenAI / Anthropic SDKs, OpenRouter, LangChain.js, Vercel AI SDK, custom clients in any language). Streaming SSE is fully supported including mid-stream cancel when a runaway response would blow the cap.

For deployment recipes (systemd, Docker, k8s, gunicorn multi-worker, nginx WS timeouts) see docs/deployment.md.

Operator console

A read-only Web UI (llm-leash-console) that visualises the proxy's live state and audit stream. Runs on its own port so a UI crash never takes down agent traffic.

Console — dark mode

At a glance:

Sticky nav with live counters and a red urgency marker when there's something to look at.
KPI strips — threats prevented (HIGH / MEDIUM / LOW / review queue) + proxy state (active sessions / spend / rules / PII redactor).
Trends charts — spend per hour (24 h), threats by agent. Click a bar → drill into the agent.
Human review queue — pending requests waiting for approval. One click per row, or bulk approve / reject / kill multiple at once.
Active sessions — top-spend sessions with inline kill button.
Threats by rule + Threat detail — every policy decision, click any row for full context.
Rule performance — operator-feedback metric: per-rule FP rate estimate with healthy / borderline_tune / high_fp_consider_relax recommendations.
Detection quality — eval-pipeline F1 over time with drift markers.
Export — one-click CSV (threats) and JSON (audit) downloads, ready for SOC 2 evidence binders.

Detail drawer

Click any row in any table to open a 480 px side panel with the full event JSON, related events from the same session or agent, and inline contextual actions. Keyboard nav: Esc closes, ↑ / ↓ cycle. The Copy link button copies a ?event=<id> URL — shareable during incident review.

Detail drawer with related events and inline actions

Bulk actions, filters, dark mode

Checkbox column on Human review queue and Active sessions for bulk approve / reject / kill. Free-text search above every table. Manual dark / light / auto mode toggle.

Bulk-select in the human-review queue

Trends — spend & threats

Live SVG charts

Running it

llm-leash-console --proxy http://localhost:8000 \
                  --audit-log audit.jsonl --port 8801

What we do NOT do

Two things matter here. One: not everything is in scope, and pretending otherwise lowers trust. Two: we want you to plug in best-in-class tools for things they're better at. llm-leash is the enforcement and evidence layer; everything else is a rule you compose.

You want	Use this instead
Prompt-injection classifier	Prompt-Guard (call from a rule)
Content guardrails (DSL)	NeMo Guardrails / Guardrails AI
Tool-arg pattern catalog	Invariant Labs (import their rules)
Eval framework	PromptFoo / DeepEval
Observability dashboard	Langfuse / LangSmith (ship our JSONL into them)
Model router	LiteLLM / OpenRouter

Install

pip install llm-leash                  # core, zero runtime deps
pip install "llm-leash[anthropic]"     # + Anthropic adapter
pip install "llm-leash[proxy]"         # + HTTP proxy mode
pip install "llm-leash[redis]"         # + Redis multi-replica state
pip install "llm-leash[all]"           # everything

Adapters auto-detect at runtime — install only what you use.

Roadmap

Version	Highlight
v1.0	Stable public API · PyPI release
v1.3	SOC 2 evidence pack generator
v2.0	HTTP proxy · SSE streaming · Redis/SQLite backends · operator console
v2.11	`LocalLLMGuardRule` (offline Llama-Guard) · 207-case eval dataset
v2.15	Console: kill / export / sparkline / drill-down / HITL panel
v2.16	Console UX: drawer · sticky nav · prod resilience (systemd, gunicorn, nginx)
v2.18	Trends charts · bulk actions · table filters · dark-mode toggle
v2.19	`ToolResultScanner` — indirect prompt injection (OWASP LLM01)
v2.20	`EnsemblePolicyEngine` — weighted multi-rule aggregation
v2.21	Session-correlated detection — exfil chains, enumeration
v2.22	Operator feedback loop — per-rule FP-rate from HITL decisions
v2.23	Continuous eval + drift detection — F1 over time, regression alerts
v2.24	Console UX polish — cost forecast, HITL audio alert, URL filter state, day-over-day KPIs, mobile responsive
v2.25	`ResponseInjectionScanner` — LLM output scanned before reaching the agent (OWASP LLM01 inverse)
v2.26	Per-tenant rate limits — token bucket per tenant with configurable RPS/burst; HTTP 429 on overflow
v2.27	`OpaRule` — write llm-leash rules in Rego against an OPA sidecar
v2.30	Insights — topology graph (agents ↔ models ↔ tools), KPI sparklines (24 h trend), rule comparison modal
v2.29	Session timeline + HITL playground — session drawer, Original/Sanitized/Diff tabs, approve_sanitized
v2.28	Issue grouping + sample drill-down + Cmd-K palette — console becomes a threat issue tracker
v3.0	TypeScript port of the core (planned)

Full per-version changelog: CHANGELOG.md.

Docs

PRODUCT.md — what this is, who buys it, what it is not.
ARCHITECTURE.md — modules, data flow, performance budget.
API.md — public surface, CLI, JSONL schema, custom rules.
docs/PROXY.md — proxy mode operator guide.
docs/deployment.md — production deployment (systemd, Docker, gunicorn, nginx, k8s).
docs/SOC2.md — SOC 2 Trust Service Criteria mapping.
docs/LEAKAGE.md — leak prevention detectors + CI recipes.

License

MIT — see LICENSE.

The OSS firewall is and always will be free. The hosted audit-log service (forthcoming) is the only thing that costs money — and you never need it. The JSONL is yours.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

2.30.0

May 21, 2026

2.29.0

May 21, 2026

2.28.0

May 19, 2026

2.27.0

May 18, 2026

2.26.0

May 18, 2026

2.25.0

May 18, 2026

2.24.0

May 18, 2026

2.23.0

May 18, 2026

2.22.0

May 18, 2026

2.21.0

May 18, 2026

2.20.0

May 18, 2026

2.19.0

May 18, 2026

2.18.0

May 18, 2026

2.13.0

May 17, 2026

2.12.0

May 17, 2026

2.11.0

May 17, 2026

2.7.0

May 17, 2026

2.4.0a1 pre-release

May 19, 2026

2.3.0a1 pre-release

May 17, 2026

2.2.0a1 pre-release

May 17, 2026

2.1.0a2 pre-release

May 17, 2026

2.1.0a1 pre-release

May 17, 2026

2.0.1

May 17, 2026

2.0.0

May 17, 2026

2.0.0a2 pre-release

May 17, 2026

2.0.0a1 pre-release

May 17, 2026

1.3.1

May 16, 2026

1.3.0

May 16, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_leash-2.30.0.tar.gz (4.9 MB view details)

Uploaded May 21, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llm_leash-2.30.0-py3-none-any.whl (227.5 kB view details)

Uploaded May 21, 2026 Python 3

File details

Details for the file llm_leash-2.30.0.tar.gz.

File metadata

Download URL: llm_leash-2.30.0.tar.gz
Upload date: May 21, 2026
Size: 4.9 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.8

File hashes

Hashes for llm_leash-2.30.0.tar.gz
Algorithm	Hash digest
SHA256	`3d1535e83a80cc858d743b350a73070c32b2c25e9459e461e9bae3d75b6c1eca`
MD5	`e9011afe0c389e91af2f323ff3703c2e`
BLAKE2b-256	`507d6eda4c371fc2fcc42650df11a208d6ad49db86c3def359e97fd3b2a954dc`

See more details on using hashes here.

File details

Details for the file llm_leash-2.30.0-py3-none-any.whl.

File metadata

Download URL: llm_leash-2.30.0-py3-none-any.whl
Upload date: May 21, 2026
Size: 227.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.8

File hashes

Hashes for llm_leash-2.30.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5476bbd01fbadb8e5aa6da60e969ad90a28610d15f312aed87002f39f67c8da8`
MD5	`2ce8a2d4da1422340c31f5de705cc249`
BLAKE2b-256	`f1808d8f06205f0475597ee37005f9efdb1989ea809c7d74d3972a387fd9b931`

See more details on using hashes here.

llm-leash 2.30.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

llm-leash

The five ways your agent kills you

1. The runaway bill — "$2,387 in 14 minutes"

2. The leak — "agent read .env and POSTed it to attacker.com"

3. The destructive call — "agent decided to DROP TABLE users"

4. The audit nightmare — "show me every action this agent took for customer X last month"

5. The silent drift — "the regex stopped working three weeks ago"

Quickstart — in-process

Quickstart — HTTP proxy

Operator console

Detail drawer

Bulk actions, filters, dark mode

Trends — spend & threats

Running it

What we do NOT do

Install

Roadmap

Docs

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

2. The leak — "agent read `.env` and POSTed it to `attacker.com`"

3. The destructive call — "agent decided to `DROP TABLE users`"