Cost ceiling, audit log, and kill switch for LLM agents.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

avelikiy

These details have not been verified by PyPI

Project description

llm-leash

The cost ceiling, audit log, and kill switch your LLM agent should never run without.

llm-leash is a 5-line runtime firewall for LLM agents. It enforces hard USD budgets, writes an immutable audit log, and gives you a kill switch and human-in-the-loop hook — without locking you into any agent framework.

Status: v2.0.0 — Production / Stable. Both the in-process middleware (v1.x surface, unchanged) and the HTTP proxy (v2.x surface) are part of the semver-stable public API. See CHANGELOG.md.

Why

You shipped an agent. Then:

A retry loop spent $2,387 in 14 minutes.
A vague user message coaxed it into DROP TABLE users.
Compliance asked "show me every action this agent took for customer X last month" — you can't.

llm-leash solves the boring B2B half of agent safety that nobody else owns: money, paperwork, panic button. It works alongside the content-safety scanners (LlamaFirewall, Invariant, Prompt-Guard) — not against them.

Two ways to install the leash

Mode	When	Code change required
In-process middleware (v1.x surface, stable)	Python agents you control the source of	1 line: `fw.wrap(client)`
HTTP proxy (v2.0 GA, stable)	Any language, vendor agents, multi-app compliance	0 lines — one env var: `OPENAI_BASE_URL=http://localhost:8000`

Both modes share the same audit format, policy engine, secrets detection, HITL queue, semantic detection (LLMGuardRule), behavioral baseline (BehavioralBaselineRule), and Prometheus metrics. Pick by code-access constraints, not by features.

Quickstart

from llm_leash import Firewall, LeashKilled
from anthropic import Anthropic

fw = Firewall(budget_usd=10.00, audit_log="audit.jsonl")
client = fw.wrap(Anthropic())

try:
    while True:
        client.messages.create(model="claude-opus-4-7", max_tokens=200,
                               messages=[{"role": "user", "content": "..."}])
except LeashKilled as e:
    print(f"Saved you the rest. Reason: {e.reason}")

Three things happen on every call:

Budget tracked — cumulative cost per session, raises LeashKilled when the cap is hit.
Audit logged — every model call appends one hash-chained JSONL line. Tamper-evident: llm-leash verify audit.jsonl.
Kill switch — await fw.kill("reason") stops the session immediately; the next call raises LeashKilled.

Run the offline demo (no API key needed):

python demo.py
llm-leash verify audit.jsonl

Proxy mode (v2.0 GA)

For agents you can't (or don't want to) modify — change one env var, get the firewall:

# Install + start
pip install "llm-leash[proxy]"
llm-leash-proxy --listen 127.0.0.1:8000 --audit-log audit.jsonl --budget-usd 50

# Point any agent at it (no source changes)
export ANTHROPIC_BASE_URL=http://localhost:8000
export OPENAI_BASE_URL=http://localhost:8000
python my_agent.py

Works with any client speaking OpenAI / Anthropic on-wire protocol: the OpenAI / Anthropic SDKs, OpenRouter, LangChain.js, Vercel AI SDK, CrewAI via LiteLLM, OpenHands, custom Rust / Go / TS agents. Identifies sessions via headers (X-LLM-Leash-Session-Id, X-LLM-Leash-Tenant-Id, X-LLM-Leash-Agent-Name), with auto-fallback to a hash of the bearer token when no headers are set.

What v2.0 GA gives you over v1.x:

Streaming SSE with end-of-stream accounting and mid-stream cancellation — if a runaway generation would blow the hard cap mid-flight, the proxy aborts the upstream connection and emits a synthetic event: error frame to the client.
Multi-replica state sharing via Redis backend (budget.backend = "redis", kill.backend = "redis") — run the proxy behind a load balancer with consistent budget cumulatives + kill state across pods.
Per-agent budget caps with [budget.per_agent_caps] TOML — agent-level ceilings override per-tenant, which override the default cap.
Operator console — llm-leash-console ships a read-only Web UI (live sessions, top spend, kill button, HITL queue, audit tail).
Alerts sidecar — llm-leash-alerts watches the audit stream and fans budget-breach / kill / HITL events out to Slack + PagerDuty.
Docker + Helm + k8s manifests for fleet deployment.

Operator workflow:

# Real-time stats from the CLI
llm-leash status --proxy http://localhost:8000

# Stop a runaway session
llm-leash kill <session_id> --proxy http://localhost:8000 --reason "blew budget"

# Approve a pending HITL request
llm-leash hitl list --proxy http://localhost:8000
llm-leash hitl approve <request_id> --proxy http://localhost:8000

# Prometheus scrape
curl http://localhost:8000/metrics

# Operator Web UI (separate binary, separate port so a UI crash never
# takes down agent traffic)
llm-leash-console --proxy http://localhost:8000 --listen 127.0.0.1:8001

See docs/PROXY.md for deployment recipes (Docker, k8s, Helm), Redis tuning, and the full TOML reference.

Adapters — one wrap, every framework (in-process, stable)

from llm_leash import Firewall
fw = Firewall(budget_usd=10.00, audit_log="audit.jsonl")

Framework	Client class	Example
Anthropic	`anthropic.Anthropic`	`fw.wrap(Anthropic()).messages.create(...)`
OpenAI	`openai.OpenAI`	`fw.wrap(OpenAI()).chat.completions.create(...)`
LangChain / LangGraph	`ChatAnthropic`, `ChatOpenAI`	`fw.wrap(ChatAnthropic(model="…")).invoke([…])`
CrewAI	`crewai.LLM`	`fw.wrap(LLM(model="openai/gpt-4o")).call([…])`
OpenHands	`openhands.llm.LLM`	`fw.wrap(LLM(config)).completion(messages=[…])`
Pydantic-AI	`pydantic_ai.models.*`	`await fw.wrap(OpenAIModel(...)).request([…])`
MCP	`mcp.ClientSession`	`await fw.wrap(session).call_tool("read_file", {…})`

All adapters are duck-typed — no SDK imports at module level, no version pinning. The wrapped client preserves every attribute of the original; only the call surface is intercepted.

# LangGraph example: drop the firewall into your existing graph
from langchain_anthropic import ChatAnthropic
from langgraph.graph import StateGraph

chat = fw.wrap(ChatAnthropic(model="claude-haiku-4-5"))
graph = StateGraph(MyState)
graph.add_node("llm", lambda state: {"reply": chat.invoke(state["messages"])})

# CrewAI example: pass the wrapped LLM to your Agent
from crewai import Agent, Crew, Task, LLM
llm = fw.wrap(LLM(model="anthropic/claude-haiku-4-5"))
agent = Agent(role="researcher", llm=llm, goal="...")
result = Crew(agents=[agent], tasks=[Task(...)]).kickoff()

# MCP example: every tool call is audited; dangerous tools can require HITL
async with ClientSession(read, write) as session:
    wrapped = fw.wrap(session)
    await wrapped.call_tool("read_file", {"path": "/etc/hosts"})

Pre-push leak prevention

Block accidental commits of internal-tooling state (.great_cto/, .claude/, .beads/), repo-boundary paths (/Users/<name>/...), and git-managed secret files (.env, id_rsa, .aws/credentials) before they reach a public remote. Same rule, two surfaces:

from llm_leash import ArtifactLeakageRule, Firewall

fw = Firewall(rules=[ArtifactLeakageRule(action="block")])
# block | hitl | redact

# As a git pre-push hook:
cp examples/pre-push-hook.sh .git/hooks/pre-push && chmod +x .git/hooks/pre-push

# Or manually:
llm-leash scan --staged                  # current git diff --cached
llm-leash scan --push-range A..B         # commits about to be pushed
llm-leash scan src/ tests/               # arbitrary paths

See docs/LEAKAGE.md for the full detector list, CI recipes, and waiver syntax.

SOC 2 evidence pack

Generate a complete SOC 2 evidence package from any audit.jsonl log:

llm-leash soc2 /var/log/agent-audit.jsonl \
  --out ./evidence-2026-Q2/ \
  --period-start 2026-04-01T00:00:00Z \
  --period-end   2026-06-30T23:59:59Z \
  --org "Acme Inc"

Produces six artefacts an auditor can attach to their evidence binder directly: executive-summary.html, cc6_access_control.csv, cc7_monitoring.csv, cc7_integrity.json, anomalies.csv, and bom.json. Each file is sha256-hashed and listed in the bill of materials. See docs/SOC2.md for the Trust Service Criteria mapping.

Persistent state for multi-worker prod

from llm_leash import Firewall, SQLiteBudgetStore, SQLiteKillRegistry

fw = Firewall(
    budget_usd=100.0,
    audit_log="/var/log/agent-audit.jsonl",
    kill_registry=SQLiteKillRegistry("/var/lib/myapp/kill.db"),
)
fw._budget._store = SQLiteBudgetStore("/var/lib/myapp/budget.db")

Redis variants (RedisBudgetStore / RedisKillRegistry) accept any duck-typed client.

For proxy mode, the same backends are config-driven — no code:

# proxy.toml
[budget]
backend = "redis"
redis_url = "redis://redis.internal:6379/0"
default_cap_usd = 50.0
per_tenant_caps = { acme = 500.0, beta = 25.0 }
per_agent_caps  = { "writer-prod" = 100.0, "researcher-prod" = 10.0 }

[kill]
backend = "redis"
redis_url = "redis://redis.internal:6379/0"

What it does

Hard USD budget per session. Soft cap warns. Hard cap kills.
Append-only JSONL audit log, hash-chained, optionally HMAC-signed. jq-able. SOC 2 / EU AI Act Article 12 evidence-shaped.
Kill switch. Stop a runaway agent from CLI, HTTP, or Redis. Sub-300ms propagation.
Human-in-the-loop webhook for high-stakes tool calls. Default-deny on timeout.
Tool ACL with regex / SQL-AST / shell-AST patterns.
PII redaction before tool dispatch and before audit write.
Adapters for Anthropic, OpenAI, LangGraph, CrewAI, OpenHands, Pydantic-AI, MCP.
Semantic threat detection (LLMGuardRule) — cheap-LLM classifier flags prompt injection / jailbreak / data exfiltration; sampling-rate aware so high-QPS is affordable.
Behavioral baseline (BehavioralBaselineRule) — Welford-online statistics per (tenant, agent); flags token spikes, new models, off-hours, tool-spam. Pluggable InMemoryBaselineStore / SQLiteBaselineStore.
Streaming + mid-stream cancel (proxy) for both Anthropic and OpenAI SSE.

What it does NOT do

You want	Use this instead
Prompt-injection classifier	Prompt-Guard (call from a rule)
Content guardrails (DSL)	NeMo Guardrails / Guardrails AI
Tool-arg pattern catalog	Invariant Labs (import their .rules from a policy)
Eval framework	PromptFoo / DeepEval
Observability dashboard	Langfuse / LangSmith (ship JSONL into them)
Model router	LiteLLM / OpenRouter

llm-leash is the layer beneath all of them. It does enforcement and evidence. Everything else is a rule you can plug in.

Documents

PRODUCT.md — what this is, who buys it, what it is not.
ARCHITECTURE.md — modules, data flow, performance budget.
API.md — public surface, CLI, JSONL schema, custom rules.
docs/PROXY.md — proxy mode operator guide (Docker, k8s, Redis, TOML).
docs/SOC2.md — SOC 2 Trust Service Criteria mapping.
docs/adr/ — architecture decisions (in progress).

Install

pip install llm-leash                  # core, zero runtime deps
pip install "llm-leash[anthropic]"     # + Anthropic adapter
pip install "llm-leash[proxy]"         # + HTTP proxy mode (starlette/uvicorn/httpx)
pip install "llm-leash[redis]"         # + Redis backend for proxy state
pip install "llm-leash[all]"           # everything

Roadmap

Version	Status	What
v0.1	✓	Core firewall + Anthropic adapter + audit chain + CLI `verify`
v0.2	✓	PolicyEngine + PII redactor
v0.3	✓	BlockedSql + BlockedShell rules
v0.4	✓	Redis transports for budget + kill
v0.5	✓	HITL gates (InMemory + Webhook) + `HitlThreshold`
v0.6	✓	LangGraph + CrewAI + MCP adapters + acceptance gate
v0.7	✓	`audit replay/export` + SQLite stores + extended CLI
v1.0	✓	Stable public API · semver lock · PyPI release · per-adapter examples
v1.1	✓	OpenHands + Pydantic-AI adapters · LlamaFirewall / Presidio rule wrappers
v1.2	✓	Durable HITL queue (SQLite/InMemory) · HTTP kill transport · CLI `hitl` ops
v1.3	✓	SOC 2 evidence pack generator · CLI `soc2` · TSC mapping
v2.0	✓	HTTP proxy mode · SSE streaming + mid-stream cancel · Redis/SQLite backends · per-agent caps · operator console (`llm-leash-console`) · alerts sidecar (`llm-leash-alerts`) · `LLMGuardRule` (semantic) · `BehavioralBaselineRule` · Docker / k8s / Helm
v2.1	planned	TypeScript port of the core
v2.2	planned	OPA / Rego policy backend

License

MIT — see LICENSE.

The OSS firewall is and always will be free. The hosted audit-log service (forthcoming) is the only thing that costs money — and you never need it. JSONL is yours.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

avelikiy

These details have not been verified by PyPI

Release history Release notifications | RSS feed

2.28.0

May 19, 2026

2.27.0

May 18, 2026

2.26.0

May 18, 2026

2.25.0

May 18, 2026

2.24.0

May 18, 2026

2.23.0

May 18, 2026

2.22.0

May 18, 2026

2.21.0

May 18, 2026

2.20.0

May 18, 2026

2.19.0

May 18, 2026

2.18.0

May 18, 2026

2.13.0

May 17, 2026

This version

2.12.0

May 17, 2026

2.11.0

May 17, 2026

2.7.0

May 17, 2026

2.4.0a1 pre-release

May 19, 2026

2.3.0a1 pre-release

May 17, 2026

2.2.0a1 pre-release

May 17, 2026

2.1.0a2 pre-release

May 17, 2026

2.1.0a1 pre-release

May 17, 2026

2.0.1

May 17, 2026

2.0.0

May 17, 2026

2.0.0a2 pre-release

May 17, 2026

2.0.0a1 pre-release

May 17, 2026

1.3.1

May 16, 2026

1.3.0

May 16, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_leash-2.12.0.tar.gz (330.3 kB view details)

Uploaded May 17, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llm_leash-2.12.0-py3-none-any.whl (148.3 kB view details)

Uploaded May 17, 2026 Python 3

File details

Details for the file llm_leash-2.12.0.tar.gz.

File metadata

Download URL: llm_leash-2.12.0.tar.gz
Upload date: May 17, 2026
Size: 330.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for llm_leash-2.12.0.tar.gz
Algorithm	Hash digest
SHA256	`21bd4c43b6d98cf83d19b4673e2ba04d53b15f53b48c9a6ac4e19c6f20053f30`
MD5	`cc5b4c24b2561dece1c1ba4c4ff26e02`
BLAKE2b-256	`560fd9c6ee55533089406173e3555653864b76fd50bf2eaa6b3246d2efc3866a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for llm_leash-2.12.0.tar.gz:

Publisher: publish.yml on avelikiy/llm-leash

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: llm_leash-2.12.0.tar.gz
- Subject digest: 21bd4c43b6d98cf83d19b4673e2ba04d53b15f53b48c9a6ac4e19c6f20053f30
- Sigstore transparency entry: 1563650947
- Sigstore integration time: May 17, 2026
Source repository:
- Permalink: avelikiy/llm-leash@afeb8ee80395f3d25387d4280e20db9f4549f66b
- Branch / Tag: refs/tags/v2.12.0
- Owner: https://github.com/avelikiy
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@afeb8ee80395f3d25387d4280e20db9f4549f66b
- Trigger Event: push

File details

Details for the file llm_leash-2.12.0-py3-none-any.whl.

File metadata

Download URL: llm_leash-2.12.0-py3-none-any.whl
Upload date: May 17, 2026
Size: 148.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for llm_leash-2.12.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5c60dcf25f1614798db3dac4c625b6b40c9bea1efd7601a177b8bcf58feaa188`
MD5	`83f84232ec3688ce71e2af8734c18b5e`
BLAKE2b-256	`b99b7675588cb43993913a3f5fad5e55ad6af821c8df7db3ebe30a28b6ee55f0`

See more details on using hashes here.

Provenance

The following attestation bundles were made for llm_leash-2.12.0-py3-none-any.whl:

Publisher: publish.yml on avelikiy/llm-leash

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: llm_leash-2.12.0-py3-none-any.whl
- Subject digest: 5c60dcf25f1614798db3dac4c625b6b40c9bea1efd7601a177b8bcf58feaa188
- Sigstore transparency entry: 1563650949
- Sigstore integration time: May 17, 2026
Source repository:
- Permalink: avelikiy/llm-leash@afeb8ee80395f3d25387d4280e20db9f4549f66b
- Branch / Tag: refs/tags/v2.12.0
- Owner: https://github.com/avelikiy
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@afeb8ee80395f3d25387d4280e20db9f4549f66b
- Trigger Event: push

llm-leash 2.12.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

llm-leash

Why

Two ways to install the leash

Quickstart

Proxy mode (v2.0 GA)

Adapters — one wrap, every framework (in-process, stable)

Pre-push leak prevention

SOC 2 evidence pack

Persistent state for multi-worker prod

What it does

What it does NOT do

Documents

Install

Roadmap

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance