Trajectory-based CI testing for AI agents
Project description
traceix
Trajectory-based CI testing for AI agents.
traceix lets you declare which tools your agent should call — and in what order — as plain YAML, then run those assertions in CI the same way you'd run unit tests. No LLM-as-judge, no flaky eval pipelines: if the trajectory doesn't match, the build fails.
pip install traceix
Why traceix?
LLM-powered agents are non-deterministic. The same prompt might call search_flights then confirm_booking today, but skip straight to confirm_booking tomorrow. traceix makes that observable and enforceable:
- Declare the expected trajectory in YAML — which tools, in what order, with what args.
- Run it in CI — the agent still calls the real LLM; only the tool responses are mocked.
- Get a clear pass/fail — no prompt-engineering an evaluator, no statistical thresholds.
Quick start
pip install traceix
traceix init # detects your framework, scaffolds traceix.yaml + tests/example.yaml
Write a test (tests/book_flight.yaml):
name: book-flight-basic
input: "Book me the cheapest flight from NYC to SFO"
mocks:
search_flights:
return: { flights: [{ id: F1, price: 390 }, { id: F2, price: 420 }] }
confirm_booking:
return: { booking_id: BK-001, status: confirmed }
expected:
trajectory:
mode: contains # these steps must appear in order (others allowed)
steps:
- tool: search_flights
args: { origin: NYC, destination: SFO }
arg_mode: partial # only check the keys listed above
- tool: confirm_booking
arg_mode: ignore
forbidden_tools: [cancel_booking]
Run it:
traceix run tests/ --handler mypackage.agent:run
✓ book-flight-basic 1/1 2 steps 142ms
──────────────────────────────────────────────
1 passed · 0 failed
Integration
tools= parameter (any framework)
Your agent handler accepts a tools list injected by traceix:
# mypackage/agent.py
def run(input: str, tools: list) -> str:
graph = build_graph(tools) # rebuild with injected (mocked) tools
result = graph.invoke({"messages": [HumanMessage(content=input)]})
return result["messages"][-1].content
@traceix_tool decorator (LangChain / LangGraph)
No handler signature changes needed — just decorate your tools:
from langchain_core.tools import tool
from traceix import traceix_tool
@traceix_tool
@tool
def search_flights(origin: str, destination: str) -> dict:
"""Search available flights."""
... # real implementation
traceix patches the mock in during test runs automatically.
CLI commands
| Command | What it does |
|---|---|
traceix init |
Detect framework, scaffold traceix.yaml + example test |
traceix run tests/ |
Run tests, exit 0 on pass / 1 on fail |
traceix run tests/ --fixture record |
Save real tool responses to .traceix/fixtures/ |
traceix run tests/ --fixture replay |
Replay recorded responses in CI |
traceix snapshot tests/ |
Save golden trajectory baselines |
traceix check tests/ |
Compare live run against saved baselines |
traceix compare tests/ --a "model=X" --b "model=Y" |
A/B test two model configs side by side |
Trajectory modes
| Mode | Meaning |
|---|---|
strict |
Exact tool order and count |
contains |
Listed steps must appear in order (extra steps allowed) |
unordered |
All steps present, any order |
within |
Steps appear as a contiguous block |
Arg modes
| Mode | Meaning |
|---|---|
exact |
Args must match exactly |
partial |
Listed keys must be present with matching values |
ignore |
Args not checked |
Framework support
| Framework | Integration |
|---|---|
| LangGraph | @traceix_tool decorator or tools= injection |
| CrewAI | tools= injection |
| Anthropic SDK | tools= injection |
| OpenAI SDK | tools= injection |
| Any other | tools= injection |
Configuration
Set defaults in traceix.yaml (or [tool.traceix] in pyproject.toml):
handler: mypackage.agent:run
runs: 3 # runs per test case (increase in CI for confidence)
tolerance: 0.67 # fraction of runs that must pass
fixture_mode: replay
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file traceix-0.1.1.tar.gz.
File metadata
- Download URL: traceix-0.1.1.tar.gz
- Upload date:
- Size: 31.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
af730cd2a3d79121f2ac86fd1089d0c94436099cdef26788d0a38d21849c61b3
|
|
| MD5 |
90fb20336e1b59a5a75f446520cc0837
|
|
| BLAKE2b-256 |
574812bd4878ef94d8d1703ab58d17d8351a8664088c5bab27acf192433444c8
|
File details
Details for the file traceix-0.1.1-py3-none-any.whl.
File metadata
- Download URL: traceix-0.1.1-py3-none-any.whl
- Upload date:
- Size: 29.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7e3d5302d1f18c8a00ed7caa6a6320c16414984762ad7bcb795efc37addba7b3
|
|
| MD5 |
ad7d7cdc443c0a663c7d9350ec0bd73e
|
|
| BLAKE2b-256 |
ec4bc7523a5a80d311fd0952ca207bb5651ce8a74f70eb015d32d63469cce3f5
|