Skip to main content

Voan Firewall — runtime guard for AI agents. Sits inline on every tool call and blocks unauthorized actions (RCE, exfiltration, data loss) before they execute.

Project description

Voan Firewall

The firewall for AI agents.
Catches known-bad and goal-inconsistent agent actions — RCE, data exfiltration, fraud — before they execute.

CI License Zero core deps Security policy


AI agents now take real actions: they run shell commands, move money, touch your database, call your APIs. One prompt injection or one poisoned tool result and an agent does something it was never asked to. Voan Firewall sits inline on every tool call, in the agent's own process, and decides allow / ask / block before any side effect happens. Think of it as the antivirus/EDR layer for agents: a fast signature tier plus an optional LLM judge for the gray zone.

Voan Firewall  → runtime:    block the exploit as it happens   (this repo)
Voan Scanner   → pre-deploy: find the agent's holes            (companion, private beta)

Why not just regex rules?

Pattern rules catch the loud stuff (rm -rf, known-bad domains). They cannot tell whether a benign-looking action — an email to a normal address, a data export — is what the user asked for, or was hijacked by poisoned tool output. So Voan adds a second tier: an LLM judge that compares each action against the user's actual goal. The judge only ever escalates a verdict to BLOCK; it never loosens one.

On a 36-case eval (eval/, grounded in an agentic-attack taxonomy — OWASP Agentic Top 10; see the klass field in eval/traces.jsonl; gpt-4o-mini judge):

36-case eval regex rules only + Voan judge
Attacks silently allowed (no gate at all) 30% (6/20) 0% (0/20)
Attacks auto-blocked 35% (7/20) 100% (20/20)
Benign hard-blocked (false positive) 6% (1/16) 6% (1/16)

Read it honestly: rules alone auto-block 35% of attacks and hold another 35% for a human (ASK on money/external sends), but silently allow the remaining 30% — that 30% is the real blind spot. The judge closes it to zero, turning the silently-allowed attacks into blocks and the held ones into auto-blocks. The one false positive is a legitimate DROP TABLE: destructive DB ops are hard-blocked by design (allowlist them explicitly), and the judge can't loosen a hard block. A further 5/16 benign actions are held for approval, not blocked — intended behaviour for money and outbound sends.

Honest caveat: 36 hand-curated cases is an optimistic ceiling, not a production guarantee, and the judge score is one run of an LLM grader. The defensible number comes from feeding real traces through the same harness — that loop is on the roadmap.

Install

Not yet on PyPI/npm — install from source (it's a single editable install):

git clone https://github.com/voan-ai/voan-firewall && cd voan-firewall
pip install -e .                 # core SDK — zero runtime dependencies
pip install -e ".[dashboard]"    # + live dashboard (fastapi, uvicorn)
pip install -e ".[langchain]"    # + the LangChain adapter demo (langchain-core)

A TypeScript/JS port lives in sdk-js/ (Node 22.6+, native TS). It currently implements the regex policy tier only — the LLM judge is Python-only for now (JS judge is on the roadmap). Consumed locally from the repo; not yet published to npm.

One line to protect an agent

import voan

tools = voan.guard(tools)      # wrap your dict/list of tool functions

That one line gives you the regex tier (the "silently-allow 30%" column above). To get the full intent-vs-hijack coverage, add the judge and tell it the user's goal:

import voan
from voan import LLMJudge, ollama_llm

fw = voan.Firewall(judge=LLMJudge())     # needs a backend (see note)
fw.set_goal("Check the delivery status of order ORD-1001.")
tools = fw.guard_tools(tools)
# An agent hijacked into emailing customer data now raises BlockedAction.

The judge needs an LLM backend: OPENAI_API_KEY in .env, or a local one via LLMJudge(llm=ollama_llm()). With no backend it is a no-op (fails open) and emits a warning. The judge sends the action + recent (untrusted) tool context to that backend — secrets and card numbers are auto-redacted first, but for privacy-sensitive/regulated agents use a local backend. See Data handling.

Works on real frameworks too — genuine LangChain tools (with langchain-core installed) via voan/adapters.py:

from voan.adapters import guard_langchain
guard_langchain(my_langchain_tools)

See it work

uvicorn server.app:app --port 8088     # live dashboard at http://127.0.0.1:8088
python demo/demo_agent.py              # naive agent: 1 allow, 1 held, 3 blocked
python demo/judge_demo.py              # intent-vs-hijack tier (needs a judge backend)
python demo/langchain_demo.py          # real LangChain tools (needs .[langchain])
python eval/run_eval.py                # reproduce the eval numbers above

How it works

  • Sensor (hook.py) — in-process wrapper on every tool call. In-process means it covers Python (and, via the port, JS) agent tools you can wrap; protocol-level/MCP coverage is on the roadmap.
  • Brain (policy.py + rules.py) — fast regex tier; first matching rule wins. It is a signature blocklist that is default-allow out of the box (flip to deny-by-default, or rely on the judge, for unrecognized actions). Local and sub-millisecond.
  • Judge (judge.py) — LLM "intent vs hijack" tier, opt-in, that only ever escalates to BLOCK. Adds an LLM round-trip (latency + cost) per gray-zone action, so it runs off the regex hot path. Pluggable backend (OpenAI / local Ollama / any callable).
  • Audit + dashboard — JSONL trail + live WebSocket feed.

Data handling & threat model

When the judge is enabled, each evaluated action's arguments plus up to the last 5 (untrusted) tool outputs are sent to your chosen LLM backend. With the default OpenAI backend, that context leaves your environment. Two mitigations ship in the box: (1) a redactor masks obvious secrets and card-like numbers before anything is sent (the regex tier still sees the raw values, so blocking is unaffected); (2) the prompt instructs the model to treat untrusted tool output as data only — a best-effort, not a guarantee, against injection of the judge itself. For sensitive or regulated deployments, run the judge on a local model (LLMJudge(llm=ollama_llm())) so nothing leaves your network. To report a vulnerability, see SECURITY.md.

Roadmap

  • Real-trace eval harness (the production FP/FN number)
  • LLM judge parity in the JS port
  • MCP proxy sensor (protocol-level, framework-agnostic)
  • Deny-by-default presets for sensitive tool families
  • Hosted policy management + team audit (the commercial open-core layer)

Apache-2.0 · github.com/voan-ai/voan-firewall

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

voan-0.1.0.tar.gz (21.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

voan-0.1.0-py3-none-any.whl (20.5 kB view details)

Uploaded Python 3

File details

Details for the file voan-0.1.0.tar.gz.

File metadata

  • Download URL: voan-0.1.0.tar.gz
  • Upload date:
  • Size: 21.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for voan-0.1.0.tar.gz
Algorithm Hash digest
SHA256 16a76612d50b5e51fac18b5b104c9d3be2908a937ed735c39aeb12dda6181ca5
MD5 6bf1018c2767406ec52eff79262773cc
BLAKE2b-256 43f229a9ff098c019b1dba461d73cdd5c7c4953f7b581358a8c85a710a9422ce

See more details on using hashes here.

File details

Details for the file voan-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: voan-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 20.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for voan-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 11bd0f734bccb209ff3f8e03d643264ab60f5c7c326ffbaf8169f939c9faec1c
MD5 25ba1ee80cb8c4e911b178e05d28470b
BLAKE2b-256 83b1e6fff2a76d64eb5abbff5e7d9d9eba500ae102c95132f02eca2e3eb225fa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page