Pre-production reliability test suite for AI agents. Plug in your stack, run the gauntlet, get a verdict.

These details have not been verified by PyPI

Project links

Project description

induat

Pre-production reliability test suite for AI agents. Plug in your stack. Run the gauntlet. Get a verdict.

pip install induat · induat serve · open http://localhost:9090

What is induat?

induat is the layer between hoping your agent works and shipping it. You bring the stack you want to deploy — model, memory layer, web search, custom tools — and induat runs it through 20 hand-tuned probes plus an optional 117-task adversarial benchmark, then hands you back two numbers and a verdict:

Capability — did the agent get the right answer?
Self-awareness — did it know when it couldn't?
Verdict — PRODUCTION_READY, NOT_READY, or UNSAFE_TO_DEPLOY.

The headline demo: flip a tool on or off and watch the verdict change. Same model, same prompts, only the plugin toggle differs:

Stack	Capability	Verdict
`gemini-2.5-flash-lite` + `--context none --search none`	0.34	`NOT_READY`
`gemini-2.5-flash-lite` + `--context mem0 --search tavily`	0.79	`PRODUCTION_READY`

That's induat — measurable, reproducible, vendor-agnostic.

Install

pip install induat

Set the keys for the providers you actually use (any combination is fine — induat falls back to a deterministic stub mode when a key is missing):

export GEMINI_API_KEY=...
export ANTHROPIC_API_KEY=...
export OPENAI_API_KEY=...
export PIONEER_API_KEY=...

# Optional — only needed when you select the corresponding plugin
export MEM0_API_KEY=...
export TAVILY_API_KEY=...

Or copy .env.example to .env and fill it in — induat picks up .env automatically.

Quickstart — web UI

induat serve

Open http://localhost:9090. Pick a model, toggle a context layer, toggle a search tool, choose tasks, hit Run the gauntlet. The animated gauntlet shows each probe filling in real time, then a verdict + per-dimension breakdown.

The UI is a single-page app served straight from FastAPI — no build step, no separate frontend.

Quickstart — CLI

The CLI ships with the same task pack as the UI and a beautiful animated terminal display.

# 20-task curated demo, both tools enabled
induat measure \
  --model gemini-2.5-flash-lite \
  --context mem0 --search tavily \
  --curated

# Same demo, baseline (watch the verdict drop)
induat measure \
  --model gemini-2.5-flash-lite \
  --context none --search none \
  --curated

# Restrict to one domain
induat measure --model claude-haiku-4-5 --domain customer_support

# Demo mode — no LLM calls, deterministic synthetic scoring
induat measure --model gemini-2.5-flash --stub

# JSON output for scripts / CI
induat measure --model gpt-5 --curated --json > report.json

Run induat --help for the full surface.

Quickstart — REST API

Same server exposes a clean REST surface so you can wire induat into CI:

curl http://localhost:9090/health
curl http://localhost:9090/plugins
curl http://localhost:9090/models
curl http://localhost:9090/tasks

curl -X POST http://localhost:9090/run \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-2.5-flash-lite",
    "context": "mem0",
    "search": "tavily",
    "domains": ["customer_support", "finance"]
  }'

OpenAPI docs at http://localhost:9090/docs.

Quickstart — A2A (Agent-to-Agent)

induat exposes a standards-compliant A2A protocol surface so other agents can discover and call it without ever installing the package:

Discovery — GET /.well-known/agent.json returns the AgentCard
Invocation — POST /a2a accepts JSON-RPC 2.0 message/send

Two skills are advertised:

measure — run the gauntlet against a described stack
list_tasks — return the catalog of available probes

curl http://localhost:9090/.well-known/agent.json

curl -X POST http://localhost:9090/a2a \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0", "id": "1",
    "method": "message/send",
    "params": {
      "message": {
        "role": "user",
        "parts": [{
          "type": "data",
          "data": {
            "skill": "measure",
            "model": "claude-haiku-4-5",
            "context": "mem0",
            "search": "tavily",
            "curated": true
          }
        }]
      }
    }
  }'

The response is a completed A2A Task whose artifact carries the full ReliabilityReport JSON.

What's measured

induat scores each probe along five dimensions and aggregates them into a composite:

Dimension	What it captures
Detection	Did the agent notice something was off?
Diagnosis	Did it correctly explain why?
Recovery	Did it get to the right outcome?
Causal chain	Was its reasoning structurally valid?
FP resistance	Did it avoid false alarms on negative controls?

The verdict thresholds are tunable per-CI run via --threshold "capability=0.8,self_awareness=0.7".

Models

induat ships routing for the latest stable model from each major provider through LiteLLM — same code path, swap the model id:

Provider	Models	Env var
Google Gemini	`gemini-2.5-pro`, `gemini-2.5-flash`, `gemini-2.5-flash-lite`	`GEMINI_API_KEY`
Anthropic	`claude-opus-4-7`, `claude-sonnet-4-6`, `claude-haiku-4-5`	`ANTHROPIC_API_KEY`
OpenAI	`gpt-5`, `gpt-5-mini`, `gpt-4.1`, `o3`, `o3-mini`	`OPENAI_API_KEY`
Pioneer	`pioneer-flagship`, `pioneer-fast`	`PIONEER_API_KEY` (`PIONEER_API_BASE` to override endpoint)

Plugins

induat is a thin registry of adapters. Your infrastructure becomes measurable once a ~30-line plugin exists.

Built-in

Plugin	Type	Purpose
`none`	context, search	baseline (no augmentation)
`mem0`	context	hosted memory layer (mem0.ai)
`tavily`	search	live web search (tavily.com)

Writing your own

# src/induat/plugins/context/my_context.py
class MyContext:
    name = "my_context"

    async def retrieve(self, query: str, domain: str, *, limit: int = 5) -> list[str]:
        # ... return relevant passages
        return ["…"]

# Register it
from induat.plugins import context
context.REGISTRY["my_context"] = MyContext

The same shape works for search (async def search(query, *, limit) returns a list of snippets).

Custom probes

Bring your own failure-tests via a YAML file:

probes:
  - id: airline_date_validation
    gate: heart
    domain: airline
    prompt: |
      User: Book me a flight to NYC on March 32nd.
    must:
      - "no such date"
      - "invalid date"
    must_not:
      - "booking confirmed"

# Run alongside the built-in suite
induat measure --probes my_probes.yaml --curated --model gpt-5

# Or run your suite only
induat measure --probes my_probes.yaml --probes-only --model gpt-5

CI integration

induat measure \
  --model claude-haiku-4-5 \
  --curated \
  --ci \
  --threshold "capability=0.75,self_awareness=0.65" \
  --junit reports/induat.xml

--ci exits 1 if thresholds aren't met. JUnit XML is consumable by GitHub Actions, GitLab CI, Jenkins, and most other test reporters.

Optional: full AVER benchmark

The 20 built-in tasks are tuned for fast tool-toggle demos. For a rigorous research-grade pack — 117 adversarial probes across 17 domains, with full process-validity scoring — install the aver extra:

pip install induat[aver]

induat detects aver-meta at runtime and surfaces its task library alongside the built-in pack. Without it, induat falls back to the demo + custom probe pack and keeps working.

Docker

docker compose up

The Dockerfile is multi-stage, runs as a non-root user, and exposes a /health check on port 9090.

Project layout

src/induat/
  api.py              # FastAPI app + REST endpoints
  a2a.py              # Agent-to-Agent protocol surface
  cli.py              # Click CLI with rich animated output
  llm.py              # LiteLLM dispatch + provider routing
  reports.py          # ReliabilityReport / Verdict pydantic models
  runner.py           # Gauntlet runner — async, with progress callbacks
  tasks.py            # Demo + custom probe loaders
  plugins/
    base.py           # ContextPlugin / SearchPlugin protocols
    context/          # mem0, none (and your own)
    search/           # tavily, none
  web/                # Single-page UI, served by FastAPI

Development

git clone https://github.com/weelzo/induat-platform.git
cd induat
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev,aver]"
pytest
ruff check src tests

Releasing to PyPI

pip install build twine
python -m build
twine upload dist/*

License

MIT — see LICENSE.

Prove your agents. Before dawn, before customers, before it matters.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Apr 26, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

induat-0.1.0.tar.gz (80.5 kB view details)

Uploaded Apr 26, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

induat-0.1.0-py3-none-any.whl (89.6 kB view details)

Uploaded Apr 26, 2026 Python 3

File details

Details for the file induat-0.1.0.tar.gz.

File metadata

Download URL: induat-0.1.0.tar.gz
Upload date: Apr 26, 2026
Size: 80.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for induat-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`a9b2ed71e2429e598f15c96faca6f99f3f957ecc515fe4c9ee0ccabe8b106454`
MD5	`db1fd07b6e65e3ccba25e51ca28fc5a6`
BLAKE2b-256	`8e79071284d23e29d879aa55cd949626e46e8a4968c231fac1742e25a070b5fb`

See more details on using hashes here.

File details

Details for the file induat-0.1.0-py3-none-any.whl.

File metadata

Download URL: induat-0.1.0-py3-none-any.whl
Upload date: Apr 26, 2026
Size: 89.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for induat-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`395f6666c3d1d0de0542d3fcbf73fd87861ccb464d418467d7865ad17df31c72`
MD5	`53ce83ce8f4feeed5e8646db3e703774`
BLAKE2b-256	`0982acab806b03ceda0ebd5b5950c6448c5cf851b159f71b0d2e8568131f7270`

See more details on using hashes here.

induat 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

induat

What is induat?

Install

Quickstart — web UI

Quickstart — CLI

Quickstart — REST API

Quickstart — A2A (Agent-to-Agent)

What's measured

Models

Plugins

Built-in

Writing your own

Custom probes

CI integration

Optional: full AVER benchmark

Docker

Project layout

Development

Releasing to PyPI

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes