Skip to main content

pytest for AI agents

Project description

TestThread 🧵

pytest for AI agents.

The open-source testing framework that tells you if your AI agent is actually working — or quietly breaking.

License API Dashboard


The Problem

You build an AI agent. It works in testing. You ship it.

Then it starts hallucinating. Returning wrong formats. Calling the wrong tools. Breaking your pipeline.

You find out when something downstream crashes — not before.

TestThread fixes that.


What TestThread Does

Define what your agent should do. TestThread runs it, checks the output, and tells you exactly what passed and what failed.

  • ✅ Define test suites per agent
  • ✅ Add test cases with expected outputs
  • ✅ Run suites against your live agent endpoint
  • ✅ Get pass/fail results with reasons
  • ✅ Track pass rate over time
  • ✅ Catch regressions before they hit production

Quick Start

pip install requests
import requests

BASE = "https://test-thread-production.up.railway.app"

# Create a test suite
suite = requests.post(f"{BASE}/suites", json={
    "name": "My Agent Tests",
    "description": "Testing my AI agent",
    "agent_endpoint": "https://your-agent.com/run"
}).json()

# Add a test case
requests.post(f"{BASE}/suites/{suite['id']}/cases", json={
    "name": "Basic response check",
    "input": "What is 2 + 2?",
    "expected_output": "4",
    "match_type": "contains"
})

# Run the suite
result = requests.post(f"{BASE}/suites/{suite['id']}/run").json()
print(f"Passed: {result['passed']} | Failed: {result['failed']}")

Match Types

Type Description
contains Output contains the expected string
exact Output matches exactly
regex Output matches a regex pattern

Live Dashboard

View your test results visually at test-thread.lovable.app


API Reference

Full docs at test-thread-production.up.railway.app/docs

Method Endpoint Description
GET / Health check
POST /suites Create test suite
GET /suites List all suites
POST /suites/{id}/cases Add test case
GET /suites/{id}/cases List test cases
POST /suites/{id}/run Run suite
GET /runs List all runs
GET /runs/{id} Get run details
GET /dashboard/stats Dashboard stats

Part of the Thread Suite

TestThread is part of a suite of open-source reliability tools for AI agents.

Tool What it does
Iron-Thread Validates AI output structure before it hits your database
TestThread Tests whether your agent behaves correctly across runs
PromptThread (coming soon) Versions and tracks prompt performance over time

Self-Host

git clone https://github.com/eugene001dayne/test-thread.git
cd test-thread
pip install -r requirements.txt
uvicorn main:app --reload

License

Apache 2.0 — free to use, modify, and distribute.


Built for developers who ship AI agents and need to know they work.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

testthread-0.4.0.tar.gz (3.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

testthread-0.4.0-py3-none-any.whl (4.0 kB view details)

Uploaded Python 3

File details

Details for the file testthread-0.4.0.tar.gz.

File metadata

  • Download URL: testthread-0.4.0.tar.gz
  • Upload date:
  • Size: 3.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for testthread-0.4.0.tar.gz
Algorithm Hash digest
SHA256 4422a2ece37379a45340adbf79fc1accb9d826219fd3c963058dabd014397b81
MD5 28eef852dc8a3c021900e7bf41432994
BLAKE2b-256 f6253b972a070a29b9c034660433e678bf50966d5c315b2a62fc07ca3c1ad539

See more details on using hashes here.

File details

Details for the file testthread-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: testthread-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 4.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for testthread-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 43eb3cd1f5a72c527b43efc1aeb93df1f75c127c4a3fcf09d655ce472dc3771d
MD5 978349be5e1f9879dc7fbce2eb776a5e
BLAKE2b-256 6394c9654f9ea1d75f3efd663d99c5cea9b14c1f22cfdf68042467f3b421425b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page