pytest for AI agents. Test, monitor, and catch behavioral drift.

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: Apache Software License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

TestThread

pytest for AI agents.

Open-source testing framework that tells you whether your AI agent actually works or just quietly breaks.

You build an AI agent. It passes your manual tests. You ship it.

Then it starts hallucinating. Returning wrong formats. Calling the wrong tools. Breaking your pipeline.

You find out when something downstream crashes not before.

TestThread fixes that.

What it does

Tell TestThread what your agent should do. It runs the agent, checks the output, and tells you exactly what passed and what failed, plus an AI diagnosis explaining why.

Features

Test Suites â€” group test cases per agent
4 Match Types â€” contains, exact, regex, semantic (AI-powered)
AI Diagnosis â€” when a test fails, it explains why and suggests a fix
Regression Detection â€” flags when pass rate drops vs previous run
PII Detection â€” auto-fails if agent leaks emails, keys, SSNs, or credit cards
Latency Tracking â€” response time per test, average per run
Cost Estimation â€” estimated token cost per run
Trajectory Assertions â€” test agent steps, not just output
Scheduled Runs â€” run suites hourly, daily, or weekly
CI/CD Integration â€” GitHub Action runs tests on every push
Webhook Alerts â€” get notified when tests fail or regress
CSV Import â€” bulk import test cases from a spreadsheet
Rate Limiting â€” API protected from abuse

Quick start

pip install testthread

from testthread import TestThread

tt = TestThread(gemini_key="your-gemini-key")

suite = tt.create_suite(
    name="My Agent Tests",
    agent_endpoint="https://your-agent.com/run"
)

tt.add_case(
    suite_id=suite["id"],
    name="Basic response check",
    input="What is 2 + 2?",
    expected_output="4",
    match_type="contains"
)

tt.add_case(
    suite_id=suite["id"],
    name="Semantic check",
    input="Say hello",
    expected_output="a friendly greeting",
    match_type="semantic"
)

result = tt.run_suite(suite["id"])
print(f"Passed: {result['passed']} | Failed: {result['failed']}")
print(f"Pass Rate: {result['curr_pass_rate']}%")
print(f"Estimated Cost: ${result['estimated_cost_usd']}")

Same thing in JavaScript:

npm install testthread

const TestThread = require("testthread");

const tt = new TestThread("https://test-thread-production.up.railway.app", "your-gemini-key");

const suite = await tt.createSuite("My Agent", "https://your-agent.com/run");
await tt.addCase(suite.id, "Hello test", "Say hi", "hello", "contains");
const result = await tt.runSuite(suite.id);
console.log(`Passed: ${result.passed} | Failed: ${result.failed}`);

Match types

Type	What it does
`contains`	Output contains the expected string
`exact`	Output matches exactly
`regex`	Output matches a regex pattern
`semantic`	AI judges if meaning matches (requires Gemini key)

Trajectory assertions

Test not just what your agent returned, but how it got there.

import requests

# Set assertions on a test case
requests.post(f"{BASE}/suites/{suite_id}/cases/{case_id}/assertions", json=[
    {"type": "tool_called", "value": "search"},
    {"type": "tool_not_called", "value": "delete_user"},
    {"type": "max_steps", "value": 5}
])

# After your agent runs, submit its trajectory
requests.post(f"{BASE}/trajectory", json={
    "run_id": run_id,
    "case_id": case_id,
    "steps": [
        {"tool": "search", "input": "query", "output": "results", "order": 1},
        {"tool": "summarize", "input": "results", "output": "summary", "order": 2}
    ]
})

Supported assertion types:

tool_called â€” assert a tool was used
tool_not_called â€” assert a tool was NOT used
max_steps â€” assert agent completed in N steps or fewer
min_steps â€” assert agent took at least N steps
tool_order â€” assert tools were called in a specific order
action_called â€” assert a specific action was performed

CI/CD integration

Add one file to your repo and TestThread runs on every push:

Add your suite ID to GitHub Secrets as TESTTHREAD_SUITE_ID
Copy .github/workflows/testthread.yml from this repo into your project

Every push to main runs your agent tests automatically. Fails the build if tests regress.

Scheduled runs

import requests

requests.post(f"{BASE}/suites/{suite_id}/schedule", json={
    "schedule": "daily",
    "schedule_enabled": True
})

Options: hourly, daily, weekly

Live dashboard

View all your test results at test-thread.lovable.app

API reference

Full docs at test-thread-production.up.railway.app/docs

Self-host

git clone https://github.com/eugene001dayne/test-thread.git
cd test-thread
pip install -r requirements.txt
uvicorn main:app --reload

Part of the Thread Suite

TestThread is one of a few open-source reliability tools for AI agents.

Tool	What it does
Iron-Thread	Validates AI output structure before it hits your database
TestThread	Tests whether your agent behaves correctly across runs
PromptThread	Versions and tracks prompt performance over time
ChainThread	Agent handoff protocol and verification infrastructure.

License

Apache 2.0 | free to use, modify, and distribute.

Built for developers who ship AI agents and need to know they work.

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: Apache Software License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

0.12.0

Apr 18, 2026

0.10.0

Mar 20, 2026

0.9.0

Mar 20, 2026

0.8.0

Mar 20, 2026

0.7.0

Mar 19, 2026

0.6.0

Mar 19, 2026

0.5.0

Mar 19, 2026

0.4.0

Mar 19, 2026

0.3.0

Mar 19, 2026

0.2.0

Mar 19, 2026

0.1.0

Mar 19, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

testthread-0.12.0.tar.gz (5.5 kB view details)

Uploaded Apr 18, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

testthread-0.12.0-py3-none-any.whl (5.7 kB view details)

Uploaded Apr 18, 2026 Python 3

File details

Details for the file testthread-0.12.0.tar.gz.

File metadata

Download URL: testthread-0.12.0.tar.gz
Upload date: Apr 18, 2026
Size: 5.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for testthread-0.12.0.tar.gz
Algorithm	Hash digest
SHA256	`e007ba2dc9ce6f1308e95d52f4af2b0d6192792dae97a0691180407cee03846f`
MD5	`0840cefedee3d605e3d6fd9793d480d2`
BLAKE2b-256	`56b7c035d30d66feb3fa0c889f8c6663b86423f4ddfb23de7954665ee1ea76f5`

See more details on using hashes here.

File details

Details for the file testthread-0.12.0-py3-none-any.whl.

File metadata

Download URL: testthread-0.12.0-py3-none-any.whl
Upload date: Apr 18, 2026
Size: 5.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for testthread-0.12.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`dbdc429492cd222bfed96782f805dd3e4ff28c7647fca2279826473fa5711627`
MD5	`ce4278eec14d8804bcf1c76728ede7c9`
BLAKE2b-256	`132e734d66242052c829057fbffa88b3604e617d65f58fce80530290a81d27a5`

See more details on using hashes here.

testthread 0.12.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

TestThread

What it does

Features

Quick start

Match types

Trajectory assertions

CI/CD integration

Scheduled runs

Live dashboard

API reference

Self-host

Part of the Thread Suite

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes