Provider-agnostic action defender SDK for AI agents

These details have not been verified by PyPI

Project links

Project description

Agent Defender

The operating layer that guards what AI agents actually do.

Guardrails check what the model says. We check what the agent does.

VoidHack June 2026 — Theme: The Operating Layer of the Internet.

A transparent, OpenAI-compatible proxy that sits between an AI agent and the LLM/world and enforces action-level policy. Point your agent's base_url at it — nothing else changes — and every tool call, egress destination, secret, and token budget is checked before the agent can act.

The enforcement engine is provider-agnostic: OpenAI-compatible providers (OpenAI, Groq, NVIDIA NIM, Mistral, Together, Fireworks, OpenRouter, local gateways) work through the proxy or SDK wrapper, while native Claude/Anthropic and Gemini adapters translate their tool-call formats into the same policy checks. LangChain integrations block tool execution regardless of the model provider.

Why

Prompt injection is unsolved (OWASP LLM01:2025). Existing guardrails classify text and approve the words — then the agent emails your secrets in the next call. The defender enforces at the layer where damage happens: the action.

It blocks by stripping the disallowed tool_call out of the model's response before the agent ever sees it — prevention, not a warning. Default fail-closed.

What it enforces

Tool allowlist — default-deny; send_email, run_shell, … are blocked.
Egress allowlist — URLs/email domains must be on the allowlist.
Injection scan — heuristic + Meta Prompt Guard 2 on tool results.
PII / secret redaction — in-flight, both directions (regex; Presidio optional).
Cost guard — per-session token budget.
Signed receipts — every decision is HMAC-signed and auditable, streamed live to a control-plane dashboard.
Safeguard reasoner — gpt-oss-safeguard-20b adds an auditable explanation on flagged actions (policy-following; reads policies/policy.yaml).

Demo (before / after)

# BREACH — agent talks straight to the model; it emails data externally
python -m agent.run_attack --task email --direct

# SAFE — same task through the defender; send_email is blocked, 0 exfiltration
python -m agent.run_attack --task email

The dashboard (/) shows decisions stream in live; Run demo attack replays a spread of attacks through the real engine. See docs/RUNBOOK.md.

Mission Control (`/mission`)

An interactive page where you hand an autonomous agent a real goal (editable task

a knowledge document you can poison), toggle the defender ON/OFF, and watch a live LLM execute step by step. Governed: the agent gets hijacked but every dangerous call is stripped — "Defender held." Ungoverned: the same agent exfiltrates — "Breach." An Impact panel proves what reached the outside world.

Stack

Layer	Tech
Proxy	Python 3.12 · FastAPI 0.136 · Uvicorn · httpx · Pydantic 2
Detection	deterministic rules · Prompt Guard 2 (Groq) · regex/Presidio PII
Reasoner	gpt-oss-safeguard-20b (Groq, selective)
Store	SQLite · SQLAlchemy 2.0 · HMAC-SHA256 receipts
Dashboard	Next.js 16 · Tailwind 4 · SSE live feed
Demo agent	OpenAI SDK · llama-3.3-70b-versatile

All models run on the Groq free tier — zero cost, no card.

SDKs and provider adapters

Install from npm:

npm install agent-defender

Install from PyPI once published:

pip install agent-defender

Python: FirewallOpenAI, FirewallAnthropic, FirewallGoogleGenerativeAI, FirewallCallbackHandler, and create_openai_compatible_firewall.
Node.js: FirewallOpenAI, FirewallAnthropic, FirewallGoogleGenerativeAI, LangChain.js callback support, and createFirewallOpenAICompatible.

See docs/SDK_INTEGRATION.md for Claude, Gemini, Groq, NVIDIA, Mistral, Together, and LangChain examples.

Layout

proxy/        FastAPI defender (app/ + tests/)
dashboard/    Next.js 16 control-plane UI
agent/        demo victim agent + poisoned document
policies/     policy.yaml
docs/         DESIGN · ARCHITECTURE · RUNBOOK

Status

Working end-to-end. Backend: ruff + mypy + 34 pytest green. Dashboard: biome + tsc + build + 2 Playwright e2e green. Verified live against Groq.

See docs/DESIGN.md for the design + decision log and docs/ARCHITECTURE.md for components and data flow.

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.0.1

Jun 29, 2026

This version

1.0.0

Jun 29, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_defender-1.0.0.tar.gz (14.6 kB view details)

Uploaded Jun 29, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

agent_defender-1.0.0-py3-none-any.whl (14.3 kB view details)

Uploaded Jun 29, 2026 Python 3

File details

Details for the file agent_defender-1.0.0.tar.gz.

File metadata

Download URL: agent_defender-1.0.0.tar.gz
Upload date: Jun 29, 2026
Size: 14.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for agent_defender-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`210ac7141f01254d577d30d497f76276484b4ee65a69c62591ce7f62d63c13ee`
MD5	`5e5cbc457fef8cd325eae8d42285b7e5`
BLAKE2b-256	`846d98e82323e2ee97018c4cdf86bb6cc6a0527fbf40b490d9d8968494692bee`

See more details on using hashes here.

File details

Details for the file agent_defender-1.0.0-py3-none-any.whl.

File metadata

Download URL: agent_defender-1.0.0-py3-none-any.whl
Upload date: Jun 29, 2026
Size: 14.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for agent_defender-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8a1856c669ee693e95b584b9b98e0e598948999d492625e209edb8c9c51750f6`
MD5	`beba4375714f124b94b93152db141736`
BLAKE2b-256	`e7a428a208bc390e6421b21bd2495861016c3d91326f654c6f0f9c381ddad16f`

See more details on using hashes here.

agent-defender 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Agent Defender

Why

What it enforces

Demo (before / after)

Mission Control (`/mission`)

Stack

SDKs and provider adapters

Layout

Status

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

agent-defender 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Agent Defender

Why

What it enforces

Demo (before / after)

Mission Control (/mission)

Stack

SDKs and provider adapters

Layout

Status

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Mission Control (`/mission`)