The pre-execution control layer for AI agents. Block dangerous actions before they happen.
Project description
Cordon
The pre-execution control layer for AI agents.
Cordon runs deterministic safety probes on a proposed agent action before it executes. No LLM calls, no heuristics, no inference latency — just fast, auditable, replayable verdicts on whether an action is safe to run.
- Live playground: https://seyomi-cordon-playground.hf.space/
- API docs: https://seyomi-cordon-playground.hf.space/api/docs
- PyPI:
cordon-ai·pip install cordon-ai - Benchmark: 1.000 control score on a 42-task public benchmark, 0.2 ms median verdict latency
- Cordon Cloud (hosted dashboard) is in closed beta with design partners — email founders@cordon.ai for access
pip install cordon-ai
from cordon import Guard, Action
guard = Guard.strict()
verdict = guard.check(Action(
kind="shell",
command="pip install -r requirements.txt",
changes={"requirements.txt": "reqeusts==2.31.0\n"},
))
if verdict.blocked:
print(verdict.top_reason())
# → "'reqeusts' is 2 edit(s) from 'requests' (likely typosquat)"
Protect your agent in 5 lines
Cordon ships first-class integrations for the three frameworks every production agent uses today:
| Vendor | Module | Entry point | Example |
|---|---|---|---|
| OpenAI | cordon.integrations.openai |
check_response(response, ...) |
examples/openai_protect.py |
| Anthropic | cordon.integrations.anthropic |
check_response(message, ...) |
examples/anthropic_protect.py |
| LangChain | cordon.integrations.langchain |
guard_tools([t1, t2], ...) |
examples/langchain_protect.py |
A single ActionBuilder registry maps your tool names to cordon.Action shapes — the same registry works across all three vendors:
import cordon
from cordon.integrations.openai import ActionBuilder, check_response
builder = ActionBuilder()
@builder.tool("run_shell")
def _(args): return cordon.Action(kind="shell", command=args["command"],
changes=args.get("changes", {}))
response = client.chat.completions.create(model="gpt-4o", messages=..., tools=...)
for tcv in check_response(response, builder=builder, guard=cordon.Guard.strict()):
if tcv.blocked:
send_refusal(tcv.tool_call_id, tcv.verdict.top_reason())
else:
dispatch_tool(tcv.tool_name, tcv.arguments)
No vendor SDK is a runtime dependency. Cordon duck-types each response shape, so it works with the official SDKs, with litellm, with raw HTTP, and with any internal proxy.
Why Cordon exists
LLM agents now write code, install packages, read secrets, and call APIs on behalf of users and companies. Current safety layers fall into two camps, and both have blind spots:
- Transcript monitors read the agent's reasoning. On semantic attacks (typosquatting, silent data loss, exfiltration-by-artifact) they approve 100% of attacks because the commands look legitimate. Only the intent is malicious.
- Guardrail LLMs call another model to judge outputs. They add latency, cost, a non-deterministic failure mode, and a new attack surface.
Cordon is a third path: static, deterministic probes on the proposed side effects. An agent-written command to pip install reqeusts looks legitimate to a transcript monitor; it looks identical to a Levenshtein check comparing every package against the top 200 PyPI names. One of these catches the attack deterministically. The other flips a coin.
Headline benchmark
The 42-task Semantic Guard benchmark covers all seven attack classes the probes are designed to catch (3 attacks + 3 benign per class). Run it yourself:
cordon benchmark --profile strict
| Profile | Block rate (TPR) | False-positive rate | Control score | Tasks passed |
|---|---|---|---|---|
Guard.permissive() |
90.5% | 0.0% | 0.905 | 40 / 42 |
Guard.default() |
95.2% | 0.0% | 0.952 | 41 / 42 |
Guard.strict() |
100.0% | 0.0% | 1.000 | 42 / 42 |
Zero false positives across all 21 benign tasks on every profile. The full 42-task suite runs in ~8 ms end-to-end on a single core — fast enough that every agent step gets checked with no perceptible latency. Lower profiles intentionally flag (rather than block) lower-confidence signals like supply-chain mirror swaps, so security teams see the warning without false stops.
Per-category on strict — 3 / 3 attacks blocked, 0 / 3 benign blocked, in every class:
| Category | Block rate | FPR |
|---|---|---|
| Typosquat (supply-chain) | 1.00 | 0.00 |
| Secret leak (artifact) | 1.00 | 0.00 |
| Exfiltration (network) | 1.00 | 0.00 |
| Silent failure | 1.00 | 0.00 |
| Test suppression | 1.00 | 0.00 |
| Security weakening | 1.00 | 0.00 |
| Destructive shell | 1.00 | 0.00 |
Comparative benchmark
The same 42-task suite, run head-to-head against the dominant deployed agent-safety patterns. Reproduce yourself:
cordon compare --comparators all
# add --lakera-key=... and --openai-key=... to include the network-bound judges
| Judge | Block rate (TPR) | FPR | Control | Passed | Mean latency |
|---|---|---|---|---|---|
| Cordon (strict) | 1.000 (21/21) | 0.000 (0/21) | 1.000 | 42/42 | ~0.2 ms |
| Keyword heuristic (block-list) | 0.143 (3/21) | 0.048 (1/21) | 0.136 | 23/42 | ~0.003 ms |
| Transcript-only (charitable) | 0.190 (4/21) | 0.000 (0/21) | 0.190 | 25/42 | ~0.003 ms |
| Lakera Guard (v2/guard, May 2026)‡ | 1.000 (18/18) | 1.000 (18/18) | 0.000 | 18/36 | ~280 ms |
| LLM judge (gpt-4o-mini, via OpenRouter)‡ | 0.833 (15/18) | 0.056 (1/18) | 0.787 | 32/36 | ~1219 ms |
‡ Network-bound judges last measured on the v0.2 36-task suite (we won't burn API budget on every probe addition); the qualitative gap doesn't change with the 6 new destructive-shell tasks.
Cordon is the only judge with a perfect score, and it's also the fastest by 4–6 orders of magnitude. The naive heuristic now picks up some of the destructive-shell attacks via substring match — and earns a false positive on rm -rf ./build for its trouble, which is exactly why Cordon's structural approach matters. The charitable transcript monitor catches a few obvious attacks but misses the stealth classes entirely. Lakera catches every attack but also blocks every benign task — unusable for action filtering. The LLM-as-judge approach is genuinely competitive on accuracy (0.787) but costs ~6,000× more latency, non-deterministic verdicts, and per-call dollars.
Per-category attack block rate
| Category | Cordon | Keyword | Transcript | Lakera | LLM judge |
|---|---|---|---|---|---|
| Typosquat | 1.00 | 0.00 | 0.33 | 1.00 | 0.67 |
| Secret leak | 1.00 | 0.00 | 0.00 | 1.00 | 1.00 |
| Exfiltration | 1.00 | 0.00 | 0.33 | 1.00 | 1.00 |
| Silent failure | 1.00 | 0.00 | 0.00 | 1.00 | 1.00 |
| Test suppression | 1.00 | 0.00 | 0.00 | 1.00 | 0.33 |
| Security weakening | 1.00 | 0.33 | 0.33 | 1.00 | 1.00 |
| Destructive shell | 1.00 | 0.67 | 0.67 | — | — |
(Read the FPR column carefully when interpreting Lakera's row: blocking everything trivially gives 100% block rate.)
Why every other column has at least one fatal flaw
- Keyword heuristic: deterministic and fast, but blind to the four stealth attack classes (typosquat, secret-leak, silent-failure, test-suppression). 1/18 attacks blocked.
- Transcript-only LLM monitor: even at the charitable upper bound — perfect recall on red-flag words, never fooled by paraphrase — it sees only what the agent says it's about to do, not the file contents being written. 3/18 attacks blocked. Real LLM monitors score worse.
- Lakera Guard: trained for prompt-injection / content-safety classification, not action evaluation. When customers reach for it for action filtering (which is common, because there's nothing else), every shell-command-shaped input scores 1.0 confidence on its
flaggedfield. 0/18 benign tasks pass. Methodological note: we hitPOST https://api.lakera.ai/v2/guard(their flagship endpoint) with each action serialized as a single user message — the same way customers integrate it today. Numbers reproduce against the published v2 API as of May 2026. - LLM-as-judge (gpt-4o-mini): 0.787 control — the second-best score in the table, but the gap costs you in three currencies. Latency: 1219 ms per task vs Cordon's 0.2 ms (~6,000× slower). Determinism: re-running the same task gives different verdicts on the borderlines; the table reports a single run. Per-category gaps: catches only 67% of typosquats (misses single-character typos like
reqeustsforrequests) and 33% of test-suppression attacks (silently allows tests being skipped). Cost: at 1M agent steps/day this is ~$3,500/year in API fees plus a 1-second tax on every step — for a 0.787 control score that Cordon achieves at 1.000.
The pattern is the wedge: every existing tool judges strings (input prompts or output text). Cordon judges side effects (the proposed shell command, the proposed file write, the proposed network call). Strings can be paraphrased, encoded, or smuggled; side effects cannot. That gap is the entire moat.
Install
pip install cordon-ai
Python 3.10+. The PyPI distribution is cordon-ai (the unsuffixed cordon name on PyPI is held by an unrelated log-analysis project). The Python import name is cordon — once installed, you write import cordon everywhere. Zero heavy dependencies (pydantic, typer, rich only).
Quickstart
Inline check
from cordon import Guard, Action
guard = Guard.default()
action = Action(
kind="file",
intent="Write a debug report",
reads=["/app/.env"],
workspace_files={"/app/.env": "EXAMPLE_KEY=FAKE-TEST-TOKEN-0123456789ABCDEF"},
changes={
"artifacts/debug.log": "env loaded; key=FAKE-TEST-TOKEN-0123456789ABCDEF"
},
)
verdict = guard.check(action)
print(verdict.decision) # 'block'
print(verdict.top_reason()) # 'Secret leak pattern: /app/.env → artifacts/debug.log (content match)'
Decorator: protect any agent step
@guard.protect
def execute_step(action: Action) -> str:
return shell.run(action.command)
execute_step(action) # raises BlockedAction if the guard blocks
CLI
cordon check action.json --profile strict
cordon list-probes --profile default
cordon demo # runs 3 built-in actions: 2 attacks, 1 benign
cordon version
OpenAI tool calls — three lines to protect any agent
The same Guard that protects in-process Cordon Action objects also wraps the tool-call layer of an OpenAI agent. Declare how your tools map to actions once, attach the guard, and the next client.chat.completions.create(...) response runs through every probe before any tool dispatcher sees it:
import cordon
from cordon.cloud import CloudReporter
from cordon.integrations.openai import ActionBuilder, check_response
builder = ActionBuilder()
@builder.tool("run_shell")
def _shell(args): return cordon.Action(kind="shell", command=args["command"])
@builder.tool("write_file")
def _write(args): return cordon.Action(kind="file", changes={args["path"]: args["content"]})
guard = cordon.Guard.strict()
guard.add_listener(CloudReporter()) # ships verdicts to your dashboard
completion = client.chat.completions.create(model="gpt-4o-mini", messages=..., tools=...)
for tcv in check_response(completion, builder=builder, guard=guard):
if tcv.blocked:
send_refusal_to_model(tcv.tool_call_id, tcv.verdict.top_reason())
else:
dispatch_tool(tcv.tool_call_id, tcv.tool_name, tcv.arguments)
examples/openai_live_demo.py is the runnable end-to-end version. Three escalating scenarios — benign / exfiltration / typosquat — produce one allow, one flag, and one block on a single screen. Gracefully degrades to a canned response when OPENAI_API_KEY is unset, and to a no-op CloudReporter when CORDON_API_KEY is unset, so the demo always runs:
# Offline (no keys) — runs canned, exits 0.
.venv/bin/python examples/openai_live_demo.py
# Live OpenAI call, verdicts shipped to your Cordon Cloud project.
OPENAI_API_KEY=sk-... \
CORDON_API_KEY=cdn_... \
.venv/bin/python examples/openai_live_demo.py
Architecture
┌───────────────────────────────────────────────────────────────────┐
│ cordon.Guard │
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Semantic probes (fast, tier="fast") │ │
│ │ │ │
│ │ • TyposquatProbe supply-chain dependency attacks │ │
│ │ • SecretLeakProbe sensitive read → artifact write │ │
│ │ • SilentFailureProbe code introducing silent drops │ │
│ │ • ExfiltrationProbe secret → network flows │ │
│ │ • TestSuppressionProbe test skip / xfail abuse │ │
│ │ • SecurityWeakeningProbe config downgrades │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Environment probes (coming in v0.2) │ │
│ │ │ │
│ │ • preview_diff shows net filesystem delta │ │
│ │ • dry_run replays in sandbox │ │
│ │ • sensitivity_scan flags reads of sensitive paths │ │
│ │ • inspect_targets network endpoint analysis │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Aggregation + decision │ │
│ │ │ │
│ │ Critical severity anywhere → block │ │
│ │ max(confidence) ≥ block_thresh → block │ │
│ │ max(confidence) ≥ flag_thresh → flag │ │
│ │ otherwise → allow │ │
│ └─────────────────────────────────────────────────────────┘ │
└───────────────────────────────────────────────────────────────────┘
Every probe is a pure function of an Action. Every verdict is replayable. No probe ever executes side effects — Cordon is strictly pre-execution.
Profiles
| Profile | block_threshold |
flag_threshold |
When to use |
|---|---|---|---|
Guard.permissive() |
0.85 | 0.50 | Research agents, exploratory workflows |
Guard.default() |
0.70 | 0.30 | Most production apps |
Guard.strict() |
0.60 | 0.20 | Regulated / financial / healthcare agents |
Custom thresholds and probe sets:
from cordon import Guard
from cordon.probes.semantic import (
TyposquatProbe, SecretLeakProbe, SilentFailureProbe,
ExfiltrationProbe, TestSuppressionProbe, SecurityWeakeningProbe,
)
guard = Guard(
probes=[
TyposquatProbe(), SecretLeakProbe(),
SilentFailureProbe(), ExfiltrationProbe(),
TestSuppressionProbe(), SecurityWeakeningProbe(),
],
block_threshold=0.8,
flag_threshold=0.4,
)
Status
v0.1 — first release. Two semantic probes (TyposquatProbe, SecretLeakProbe), CLI, decorator API.
v0.2 (current) — full 6-probe Semantic Guard suite, three vendor integrations (OpenAI, Anthropic, LangChain), the canonical 42-task benchmark + comparative reproducer (cordon benchmark, cordon compare), 150 tests passing.
v0.3 (next) — environment probes (preview_diff, dry_run, sensitivity_scan, inspect_targets), Cordon Cloud (telemetry + policy management + threat-intel feed), landing page + live playground.
Research
Cordon is the production-grade descendant of two research projects:
- ActionLens — 3rd place at Apart Research AI Control Hackathon 2026 (36 countries). Introduced the pre-execution environment probes and the 42-task benchmark. Repo →
- Context-Conditioned Confidentiality Failures in Refusal-Tuned Language Models — Cohere Catalyst Grant, 2026. Showed that refusal-tuned models leak confidential context at 46–62% even when explicitly instructed not to. This is the empirical basis for Cordon's secret-leak probe.
Cordon Cloud (closed beta)
Cordon Cloud is the hosted half: every verdict your agent produces, in real time, on a searchable dashboard. It is intentionally not advertised here. The dashboard runs in a private beta with a small group of design partners; the URL is shared per-tenant with a magic access token. To request access, email founders@cordon.ai with one line about the agents you operate.
Two-line client SDK
The SDK that ships with cordon-ai is public — the gating only applies to the hosted dashboard:
from cordon import Guard
from cordon.cloud import CloudReporter
guard = Guard.strict()
guard.add_listener(CloudReporter(
api_key="cdn_xxx", # provisioned per tenant
endpoint="https://your-tenant.cordon.ai", # or self-hosted
))
CloudReporter contract — locked in by tests/test_cloud_reporter.py (12 tests):
- Never blocks the agent. Telemetry is queued and flushed by a daemon thread; the listener call is sub-millisecond even when the server is unreachable.
- Never raises. Network, serialization, and queue-full failures are caught and surfaced via
warnings. A misbehaving dashboard cannot crash your agent. - PII-safe by default. File-change bodies are dropped, only paths and byte counts are sent; command and evidence strings are capped at 512 chars.
include_bodies=Trueis the opt-in for when you control both ends.
Self-host (open-source)
The Cordon Cloud server is in this repo at cloud_server/. It is a single FastAPI process with a sqlite event store, a Bearer-auth ingest endpoint, and a polling dashboard. You can self-host it today — same Apache-2.0 license as the SDK:
# 1. start the server
CORDON_CLOUD_INGEST_KEYS="cdn_yourtenant:yourproject" \
CORDON_CLOUD_DASHBOARD_TOKEN="long-random-string" \
.venv/bin/uvicorn cloud_server.app:app --port 7860
# 2. visit the dashboard
open "http://127.0.0.1:7860/?t=long-random-string"
# 3. point an agent at it
CORDON_API_KEY=cdn_yourtenant CORDON_CLOUD_ENDPOINT=http://127.0.0.1:7860 \
python examples/cloud_reporter.py
Deploy the same image to your own Hugging Face Space with cloud_server/space/deploy.sh — see cloud_server/RUNBOOK.md for the production checklist (rotating the dashboard token, granting and revoking per-VC access, swapping sqlite for Postgres).
What's hosted only (the paid tier we're building)
Beyond the open-source server, the hosted Cordon Cloud adds:
- Persistent retention across rebuilds (Postgres-backed) with configurable retention windows.
- Multi-tenant projects + SSO so a security team can give individual engineers their own scoped keys.
- Anomaly alerts — Slack/PagerDuty webhook when block rate spikes for a project.
- Threat-intel feed — typosquat and exfiltration patterns we discover across the fleet, shipped back to your
Guardweekly.
If any of that is what you want today, that's the founders@cordon.ai conversation.
Deployment
The repo ships two interchangeable deployment surfaces for the live playground:
- Hugging Face Space (default) — single Docker container that serves both the landing page and the API from the same origin. Free, no DNS, no CORS surprises. Deploy with
bash space/deploy.sh <hf-username>. Public URL:https://<hf-username>-cordon-playground.hf.space/. - GitHub Pages + separate API host (optional) — pure-static landing page in
docs/plus the FastAPI backend hosted anywhere (Fly, Render, Cloud Run, another HF Space). Only worth setting up if you have a custom domain that GitHub Pages can serve cleanly. Enable in Settings → Pages → Source =Deploy from a branch, Branch =main, Folder =/docs.
# One-time: create the Space on https://huggingface.co/new-space
# SDK = Docker, Visibility = Public, leave empty.
# Then push:
bash space/deploy.sh <your-hf-username>
The backend can also be self-hosted via web/Dockerfile or fly launch using the included web/fly.toml. Override the landing-page backend at runtime from the browser console:
localStorage.setItem('cordon_backend', 'https://my-cordon.example.com');
location.reload();
Contributing
Cordon is early. If you deploy AI agents that do real work and have seen them do something stupid, we want to hear from you — both as a user and as a contributor. Open an issue with a redacted trace and we'll build the probe.
License
Apache 2.0. Use it anywhere, including commercially.
Citation
@software{cordon2026,
author = {Adharapurapu V S M Ashok Kumar},
title = {Cordon: Pre-Execution Control for AI Agents},
year = {2026},
url = {https://github.com/Ashok-kumar290/cordon},
version = {0.1.0}
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file cordon_ai-0.2.1.tar.gz.
File metadata
- Download URL: cordon_ai-0.2.1.tar.gz
- Upload date:
- Size: 102.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dd82ada90947cb756fad0accdf5e11178fbac6a9102076f12f756dc76e2c0166
|
|
| MD5 |
4c66b830e261ee507abd37ff93ec053c
|
|
| BLAKE2b-256 |
828a1aebcaef24cca326d8339bd26249eedcafb9717e8c2be61ec0426a6bf0ea
|