Pre-production reliability test suite for AI agents. Plug in your stack, run the gauntlet, get a verdict.
Project description
induat
Pre-production reliability test suite for AI agents. Plug in your stack. Run the gauntlet. Get a verdict.
pip install induat · induat serve · open http://localhost:9090
What is induat?
induat is the layer between hoping your agent works and shipping it. You bring the stack you want to deploy — model, memory layer, web search, custom tools — and induat runs it through 20 hand-tuned probes plus an optional 117-task adversarial benchmark, then hands you back two numbers and a verdict:
- Capability — did the agent get the right answer?
- Self-awareness — did it know when it couldn't?
- Verdict —
PRODUCTION_READY,NOT_READY, orUNSAFE_TO_DEPLOY.
The headline demo: flip a tool on or off and watch the verdict change. Same model, same prompts, only the plugin toggle differs:
| Stack | Capability | Verdict |
|---|---|---|
gemini-2.5-flash-lite + --context none --search none |
0.34 | NOT_READY |
gemini-2.5-flash-lite + --context mem0 --search tavily |
0.79 | PRODUCTION_READY |
That's induat — measurable, reproducible, vendor-agnostic.
Install
pip install induat
Set the keys for the providers you actually use (any combination is fine — induat falls back to a deterministic stub mode when a key is missing):
export GEMINI_API_KEY=...
export ANTHROPIC_API_KEY=...
export OPENAI_API_KEY=...
export PIONEER_API_KEY=...
# Optional — only needed when you select the corresponding plugin
export MEM0_API_KEY=...
export TAVILY_API_KEY=...
Or copy .env.example to .env and fill it in — induat picks up .env automatically.
Quickstart — web UI
induat serve
Open http://localhost:9090. Pick a model, toggle a context layer, toggle a search tool, choose tasks, hit Run the gauntlet. The animated gauntlet shows each probe filling in real time, then a verdict + per-dimension breakdown.
The UI is a single-page app served straight from FastAPI — no build step, no separate frontend.
Quickstart — CLI
The CLI ships with the same task pack as the UI and a beautiful animated terminal display.
# 20-task curated demo, both tools enabled
induat measure \
--model gemini-2.5-flash-lite \
--context mem0 --search tavily \
--curated
# Same demo, baseline (watch the verdict drop)
induat measure \
--model gemini-2.5-flash-lite \
--context none --search none \
--curated
# Restrict to one domain
induat measure --model claude-haiku-4-5 --domain customer_support
# Demo mode — no LLM calls, deterministic synthetic scoring
induat measure --model gemini-2.5-flash --stub
# JSON output for scripts / CI
induat measure --model gpt-5 --curated --json > report.json
Run induat --help for the full surface.
Quickstart — REST API
Same server exposes a clean REST surface so you can wire induat into CI:
curl http://localhost:9090/health
curl http://localhost:9090/plugins
curl http://localhost:9090/models
curl http://localhost:9090/tasks
curl -X POST http://localhost:9090/run \
-H "Content-Type: application/json" \
-d '{
"model": "gemini-2.5-flash-lite",
"context": "mem0",
"search": "tavily",
"domains": ["customer_support", "finance"]
}'
OpenAPI docs at http://localhost:9090/docs.
Quickstart — A2A (Agent-to-Agent)
induat exposes a standards-compliant A2A protocol surface so other agents can discover and call it without ever installing the package:
- Discovery —
GET /.well-known/agent.jsonreturns the AgentCard - Invocation —
POST /a2aaccepts JSON-RPC 2.0message/send
Two skills are advertised:
measure— run the gauntlet against a described stacklist_tasks— return the catalog of available probes
curl http://localhost:9090/.well-known/agent.json
curl -X POST http://localhost:9090/a2a \
-H "Content-Type: application/json" \
-d '{
"jsonrpc": "2.0", "id": "1",
"method": "message/send",
"params": {
"message": {
"role": "user",
"parts": [{
"type": "data",
"data": {
"skill": "measure",
"model": "claude-haiku-4-5",
"context": "mem0",
"search": "tavily",
"curated": true
}
}]
}
}
}'
The response is a completed A2A Task whose artifact carries the full ReliabilityReport JSON.
What's measured
induat scores each probe along five dimensions and aggregates them into a composite:
| Dimension | What it captures |
|---|---|
| Detection | Did the agent notice something was off? |
| Diagnosis | Did it correctly explain why? |
| Recovery | Did it get to the right outcome? |
| Causal chain | Was its reasoning structurally valid? |
| FP resistance | Did it avoid false alarms on negative controls? |
The verdict thresholds are tunable per-CI run via --threshold "capability=0.8,self_awareness=0.7".
Models
induat ships routing for the latest stable model from each major provider through LiteLLM — same code path, swap the model id:
| Provider | Models | Env var |
|---|---|---|
| Google Gemini | gemini-2.5-pro, gemini-2.5-flash, gemini-2.5-flash-lite |
GEMINI_API_KEY |
| Anthropic | claude-opus-4-7, claude-sonnet-4-6, claude-haiku-4-5 |
ANTHROPIC_API_KEY |
| OpenAI | gpt-5, gpt-5-mini, gpt-4.1, o3, o3-mini |
OPENAI_API_KEY |
| Pioneer | pioneer-flagship, pioneer-fast |
PIONEER_API_KEY (PIONEER_API_BASE to override endpoint) |
Plugins
induat is a thin registry of adapters. Your infrastructure becomes measurable once a ~30-line plugin exists.
Built-in
| Plugin | Type | Purpose |
|---|---|---|
none |
context, search | baseline (no augmentation) |
mem0 |
context | hosted memory layer (mem0.ai) |
tavily |
search | live web search (tavily.com) |
Writing your own
# src/induat/plugins/context/my_context.py
class MyContext:
name = "my_context"
async def retrieve(self, query: str, domain: str, *, limit: int = 5) -> list[str]:
# ... return relevant passages
return ["…"]
# Register it
from induat.plugins import context
context.REGISTRY["my_context"] = MyContext
The same shape works for search (async def search(query, *, limit) returns a list of snippets).
Custom probes
Bring your own failure-tests via a YAML file:
probes:
- id: airline_date_validation
gate: heart
domain: airline
prompt: |
User: Book me a flight to NYC on March 32nd.
must:
- "no such date"
- "invalid date"
must_not:
- "booking confirmed"
# Run alongside the built-in suite
induat measure --probes my_probes.yaml --curated --model gpt-5
# Or run your suite only
induat measure --probes my_probes.yaml --probes-only --model gpt-5
CI integration
induat measure \
--model claude-haiku-4-5 \
--curated \
--ci \
--threshold "capability=0.75,self_awareness=0.65" \
--junit reports/induat.xml
--ci exits 1 if thresholds aren't met. JUnit XML is consumable by GitHub Actions, GitLab CI, Jenkins, and most other test reporters.
Optional: full AVER benchmark
The 20 built-in tasks are tuned for fast tool-toggle demos. For a rigorous research-grade pack — 117 adversarial probes across 17 domains, with full process-validity scoring — install the aver extra:
pip install induat[aver]
induat detects aver-meta at runtime and surfaces its task library alongside the built-in pack. Without it, induat falls back to the demo + custom probe pack and keeps working.
Docker
docker compose up
The Dockerfile is multi-stage, runs as a non-root user, and exposes a /health check on port 9090.
Project layout
src/induat/
api.py # FastAPI app + REST endpoints
a2a.py # Agent-to-Agent protocol surface
cli.py # Click CLI with rich animated output
llm.py # LiteLLM dispatch + provider routing
reports.py # ReliabilityReport / Verdict pydantic models
runner.py # Gauntlet runner — async, with progress callbacks
tasks.py # Demo + custom probe loaders
plugins/
base.py # ContextPlugin / SearchPlugin protocols
context/ # mem0, none (and your own)
search/ # tavily, none
web/ # Single-page UI, served by FastAPI
Development
git clone https://github.com/weelzo/induat-platform.git
cd induat
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev,aver]"
pytest
ruff check src tests
Releasing to PyPI
pip install build twine
python -m build
twine upload dist/*
License
MIT — see LICENSE.
Prove your agents. Before dawn, before customers, before it matters.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file induat-0.1.0.tar.gz.
File metadata
- Download URL: induat-0.1.0.tar.gz
- Upload date:
- Size: 80.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a9b2ed71e2429e598f15c96faca6f99f3f957ecc515fe4c9ee0ccabe8b106454
|
|
| MD5 |
db1fd07b6e65e3ccba25e51ca28fc5a6
|
|
| BLAKE2b-256 |
8e79071284d23e29d879aa55cd949626e46e8a4968c231fac1742e25a070b5fb
|
File details
Details for the file induat-0.1.0-py3-none-any.whl.
File metadata
- Download URL: induat-0.1.0-py3-none-any.whl
- Upload date:
- Size: 89.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
395f6666c3d1d0de0542d3fcbf73fd87861ccb464d418467d7865ad17df31c72
|
|
| MD5 |
53ce83ce8f4feeed5e8646db3e703774
|
|
| BLAKE2b-256 |
0982acab806b03ceda0ebd5b5950c6448c5cf851b159f71b0d2e8568131f7270
|