pytest for AI agents. Test, monitor, and catch behavioral drift.
Project description
TestThread
pytest for AI agents.
Open-source testing framework that tells you whether your AI agent actually works or just quietly breaks.
You build an AI agent. It passes your manual tests. You ship it.
Then it starts hallucinating. Returning wrong formats. Calling the wrong tools. Breaking your pipeline.
You find out when something downstream crashes not before.
TestThread fixes that.
What it does
Tell TestThread what your agent should do. It runs the agent, checks the output, and tells you exactly what passed and what failed, plus an AI diagnosis explaining why.
Features
- Test Suites — group test cases per agent
- 4 Match Types — contains, exact, regex, semantic (AI-powered)
- AI Diagnosis — when a test fails, it explains why and suggests a fix
- Regression Detection — flags when pass rate drops vs previous run
- PII Detection — auto-fails if agent leaks emails, keys, SSNs, or credit cards
- Latency Tracking — response time per test, average per run
- Cost Estimation — estimated token cost per run
- Trajectory Assertions — test agent steps, not just output
- Scheduled Runs — run suites hourly, daily, or weekly
- CI/CD Integration — GitHub Action runs tests on every push
- Webhook Alerts — get notified when tests fail or regress
- CSV Import — bulk import test cases from a spreadsheet
- Rate Limiting — API protected from abuse
Quick start
pip install testthread
from testthread import TestThread
tt = TestThread(gemini_key="your-gemini-key")
suite = tt.create_suite(
name="My Agent Tests",
agent_endpoint="https://your-agent.com/run"
)
tt.add_case(
suite_id=suite["id"],
name="Basic response check",
input="What is 2 + 2?",
expected_output="4",
match_type="contains"
)
tt.add_case(
suite_id=suite["id"],
name="Semantic check",
input="Say hello",
expected_output="a friendly greeting",
match_type="semantic"
)
result = tt.run_suite(suite["id"])
print(f"Passed: {result['passed']} | Failed: {result['failed']}")
print(f"Pass Rate: {result['curr_pass_rate']}%")
print(f"Estimated Cost: ${result['estimated_cost_usd']}")
Same thing in JavaScript:
npm install testthread
const TestThread = require("testthread");
const tt = new TestThread("https://test-thread-production.up.railway.app", "your-gemini-key");
const suite = await tt.createSuite("My Agent", "https://your-agent.com/run");
await tt.addCase(suite.id, "Hello test", "Say hi", "hello", "contains");
const result = await tt.runSuite(suite.id);
console.log(`Passed: ${result.passed} | Failed: ${result.failed}`);
Match types
| Type | What it does |
|---|---|
contains |
Output contains the expected string |
exact |
Output matches exactly |
regex |
Output matches a regex pattern |
semantic |
AI judges if meaning matches (requires Gemini key) |
Trajectory assertions
Test not just what your agent returned, but how it got there.
import requests
# Set assertions on a test case
requests.post(f"{BASE}/suites/{suite_id}/cases/{case_id}/assertions", json=[
{"type": "tool_called", "value": "search"},
{"type": "tool_not_called", "value": "delete_user"},
{"type": "max_steps", "value": 5}
])
# After your agent runs, submit its trajectory
requests.post(f"{BASE}/trajectory", json={
"run_id": run_id,
"case_id": case_id,
"steps": [
{"tool": "search", "input": "query", "output": "results", "order": 1},
{"tool": "summarize", "input": "results", "output": "summary", "order": 2}
]
})
Supported assertion types:
tool_called— assert a tool was usedtool_not_called— assert a tool was NOT usedmax_steps— assert agent completed in N steps or fewermin_steps— assert agent took at least N stepstool_order— assert tools were called in a specific orderaction_called— assert a specific action was performed
CI/CD integration
Add one file to your repo and TestThread runs on every push:
- Add your suite ID to GitHub Secrets as
TESTTHREAD_SUITE_ID - Copy
.github/workflows/testthread.ymlfrom this repo into your project
Every push to main runs your agent tests automatically. Fails the build if tests regress.
Scheduled runs
import requests
requests.post(f"{BASE}/suites/{suite_id}/schedule", json={
"schedule": "daily",
"schedule_enabled": True
})
Options: hourly, daily, weekly
Live dashboard
View all your test results at test-thread.lovable.app
API reference
Full docs at test-thread-production.up.railway.app/docs
Self-host
git clone https://github.com/eugene001dayne/test-thread.git
cd test-thread
pip install -r requirements.txt
uvicorn main:app --reload
Part of the Thread Suite
TestThread is one of a few open-source reliability tools for AI agents.
| Tool | What it does |
|---|---|
| Iron-Thread | Validates AI output structure before it hits your database |
| TestThread | Tests whether your agent behaves correctly across runs |
| PromptThread | Versions and tracks prompt performance over time |
| ChainThread | Agent handoff protocol and verification infrastructure. |
License
Apache 2.0 | free to use, modify, and distribute.
Built for developers who ship AI agents and need to know they work.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file testthread-0.12.0.tar.gz.
File metadata
- Download URL: testthread-0.12.0.tar.gz
- Upload date:
- Size: 5.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e007ba2dc9ce6f1308e95d52f4af2b0d6192792dae97a0691180407cee03846f
|
|
| MD5 |
0840cefedee3d605e3d6fd9793d480d2
|
|
| BLAKE2b-256 |
56b7c035d30d66feb3fa0c889f8c6663b86423f4ddfb23de7954665ee1ea76f5
|
File details
Details for the file testthread-0.12.0-py3-none-any.whl.
File metadata
- Download URL: testthread-0.12.0-py3-none-any.whl
- Upload date:
- Size: 5.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dbdc429492cd222bfed96782f805dd3e4ff28c7647fca2279826473fa5711627
|
|
| MD5 |
ce4278eec14d8804bcf1c76728ede7c9
|
|
| BLAKE2b-256 |
132e734d66242052c829057fbffa88b3604e617d65f58fce80530290a81d27a5
|