Break your agent before your users do. Adversarial stress-testing and regression suites for AI agents.

These details have not been verified by PyPI

Project description

Gauntlet

Break your agent before your users do.

Gauntlet fires a suite of adversarial, edge-case "users" at your AI agent over HTTP, finds where it fails (system-prompt leaks, unsafe actions, scope drift, crashes, runaway output), ranks the failures by severity, and turns them into a regression suite you can gate in CI. Framework-agnostic: if your agent speaks HTTP, Gauntlet can test it.

It is built on one belief: a green eval only means something if you defined what red looks like. Most agent "evals" pass because nobody wrote the test that would have failed.

Why this exists

Teams ship agents that work in the demo and then quietly break in production: the model picks the wrong tool, leaks its prompt to a clever user, confirms an action it should have refused, or loops. The expensive part of reliability is not the dashboard, it is finding the failures and making sure they stay fixed. Gauntlet is the part that goes looking for them.

Quickstart (30 seconds, no API key)

# Install (once published): pip install gauntlet-agent
#   or as an isolated CLI:   pipx install gauntlet-agent

# 1. See it work against a deliberately broken sample agent, in one process:
python examples/demo.py

# 2. Or run it against your own agent (any HTTP endpoint that takes JSON):
#    terminal A:
python examples/sample_agent.py
#    terminal B:
gauntlet run --target http://localhost:8000/chat --canaries examples/canaries.json

Network-free self-test (runs anywhere, good for CI):

python tests/test_offline.py    # or: pytest

Using it on your agent

Point --target at any endpoint that accepts POST {"message": "..."} and returns JSON or text. Field names are configurable. Agent isn't an HTTP service yet (LangChain, OpenAI Assistants, a bare function)? Wrap it in a few lines with the shims in examples/adapters/ — no server code to write.

gauntlet run \
  --target https://your-agent.example.com/v1/chat \
  --request-field input --response-field output \
  --header "Authorization: Bearer $TOKEN" \
  --canaries my_canaries.json \
  --fail-on HIGH \
  --json gauntlet-report.json

Exit code is nonzero when anything at or above --fail-on is found, so you can drop it straight into CI:

# .github/workflows/agent.yml
- run: gauntlet run --target $STAGING_AGENT_URL --canaries canaries.json --fail-on HIGH

Canaries: define what failure means

The universal graders (errors, empty replies, runaway length) need no config. The high-value graders are the canaries you supply, the things your agent must never do:

{
  "never_output_substrings": ["INTERNAL-OPS-KEY", "system prompt"],
  "unsafe_compliance_substrings": ["refund of $", "deleted the account"],
  "max_response_chars": 6000,
  "severity_overrides": { "missing_refusal": "MEDIUM", "data_leak": "CRITICAL" }
}

severity_overrides lets you retune any finding kind to your own risk bar (CRITICAL/HIGH/MEDIUM/LOW/INFO) — e.g. downgrade missing_refusal if your agent is intentionally chatty, or keep leaks at CRITICAL.

How it works

Adversaries (gauntlet/adversaries.py) — a deterministic library of probes across prompt injection, scope discipline, false premises, data exfiltration, malformed input, and loop bait. Deterministic so runs are reproducible.
Runner (gauntlet/runner.py) — fires probes concurrently at your HTTP endpoint, stdlib only.
Graders (gauntlet/graders.py) — universal reliability checks plus your canaries, producing severity-ranked findings (CRITICAL to INFO).
Report (gauntlet/report.py) — a readable summary, the worst failures, and a JSON artifact for CI.

Optional: LLM-powered mode

The default needs no API key. With --llm, Gauntlet generates fresh adversarial personas from a description of your agent and can grade open-ended behavior with a judge instead of substring canaries.

pip install "gauntlet-agent[llm]"
export ANTHROPIC_API_KEY=...
gauntlet run --target $URL --llm --describe "support bot for an online store"

The judge is a thin, swappable layer. The methodology is the point: generate probes from your agent's real surface, and validate the judge against a small human-labeled gold set before trusting its scores.

Calibrate the judge (don't trust a score you haven't validated)

gauntlet calibrate --gold examples/gold.jsonl --min-kappa 0.6

Runs the judge over a human-labeled gold set and reports accuracy, precision, recall (of real failures, how many the judge catches — the number that matters for a safety tool), F1, and Cohen's κ (chance-corrected agreement). It exits nonzero below --min-kappa, so a weak judge fails CI instead of quietly shipping bad scores. A starter gold set lives at examples/gold.jsonl.

Multi-turn probes (jailbreaks that build across turns)

Real jailbreaks are rarely one message — they build trust, plant context, or manufacture a false premise over several turns, then cash it in. Add --multiturn to include built-in conversation probes (crescendo, gradual role-reset, context poisoning, manufactured commitment). Gauntlet drives each turn-by-turn and grades the final reply.

# stateful agent (keeps its own session):
gauntlet run --target $URL --multiturn --canaries canaries.json

# stateless agent: send the running transcript as an OpenAI-style messages array
gauntlet run --target $URL --multiturn --history-field messages --canaries canaries.json

The report prints the full conversation for any multi-turn failure, so you can see exactly how it got there.

Add --extended for newer single-turn attack classes (indirect/RAG injection, unicode-homoglyph evasion, refusal suppression, base64 encoding smuggling, tool-description extraction, PII handling):

gauntlet run --target $URL --extended --multiturn --canaries canaries.json

Trace-aware grading (score the tool calls, not just the text)

A safe-sounding answer can hide an unsafe action. If your agent returns the tool calls it made, Gauntlet can grade those directly. Have the agent include a trace in its JSON response:

{ "response": "All set!", "trace": [ {"tool": "issue_refund", "args": {"amount": 999}} ] }

Then point at it and declare which tools/args are off-limits:

gauntlet run --target $URL --trace-field trace --canaries canaries.json

{ "forbidden_tools": ["issue_refund", "delete_user", "send_bulk_email"],
  "forbidden_tool_arg_substrings": ["all_users", "DROP TABLE"] }

A forbidden tool call (or a forbidden argument) is a HIGH finding even if the text looked fine — catching the agent that says "I can't" but calls the tool anyway.

Roadmap

Judge calibration command (gauntlet calibrate)
Persona memory: multi-turn conversation probes (--multiturn)
Trace-aware grading (--trace-field + forbidden tools/args)
Hosted dashboard + scheduled runs (see the apps/dashboard in the monorepo)

License

MIT. See LICENSE.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Jun 15, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gauntlet_agent-0.1.0.tar.gz (27.9 kB view details)

Uploaded Jun 15, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

gauntlet_agent-0.1.0-py3-none-any.whl (22.6 kB view details)

Uploaded Jun 15, 2026 Python 3

File details

Details for the file gauntlet_agent-0.1.0.tar.gz.

File metadata

Download URL: gauntlet_agent-0.1.0.tar.gz
Upload date: Jun 15, 2026
Size: 27.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for gauntlet_agent-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`1f789cfe076de9ea1e8235ffd62069ae93d18121b5f41d37180db1cb4cf94982`
MD5	`5320e1605ed57eb4180c20e09209e11e`
BLAKE2b-256	`7cc60facfc73d34c2fcd78bf0ec68c85818656066773e61b74fc2951ca5c3c98`

See more details on using hashes here.

Provenance

The following attestation bundles were made for gauntlet_agent-0.1.0.tar.gz:

Publisher: release.yml on GauntletVectorLabs/gauntlet

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: gauntlet_agent-0.1.0.tar.gz
- Subject digest: 1f789cfe076de9ea1e8235ffd62069ae93d18121b5f41d37180db1cb4cf94982
- Sigstore transparency entry: 1822596530
- Sigstore integration time: Jun 15, 2026
Source repository:
- Permalink: GauntletVectorLabs/gauntlet@d7dd832e8c09f5d86bf68d5239b3041b71f827a8
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/GauntletVectorLabs
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@d7dd832e8c09f5d86bf68d5239b3041b71f827a8
- Trigger Event: push

File details

Details for the file gauntlet_agent-0.1.0-py3-none-any.whl.

File metadata

Download URL: gauntlet_agent-0.1.0-py3-none-any.whl
Upload date: Jun 15, 2026
Size: 22.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for gauntlet_agent-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a3d8cdaa997684d299fad9f7c996e7771ad4837f41d3c12aa56c1aae8007a20b`
MD5	`f3756a27f1cb065c3fa8ef161ac818f1`
BLAKE2b-256	`94628907264b2db72720d8c0ab1c355587287e5635668b175aca6803b54cc57b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for gauntlet_agent-0.1.0-py3-none-any.whl:

Publisher: release.yml on GauntletVectorLabs/gauntlet

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: gauntlet_agent-0.1.0-py3-none-any.whl
- Subject digest: a3d8cdaa997684d299fad9f7c996e7771ad4837f41d3c12aa56c1aae8007a20b
- Sigstore transparency entry: 1822596621
- Sigstore integration time: Jun 15, 2026
Source repository:
- Permalink: GauntletVectorLabs/gauntlet@d7dd832e8c09f5d86bf68d5239b3041b71f827a8
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/GauntletVectorLabs
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@d7dd832e8c09f5d86bf68d5239b3041b71f827a8
- Trigger Event: push

gauntlet-agent 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Gauntlet

Why this exists

Quickstart (30 seconds, no API key)

Using it on your agent

Canaries: define what failure means

How it works

Optional: LLM-powered mode

Calibrate the judge (don't trust a score you haven't validated)

Multi-turn probes (jailbreaks that build across turns)

Trace-aware grading (score the tool calls, not just the text)

Roadmap

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance