Skip to main content

Trajectory-based CI testing for AI agents

Project description

traceix

Trajectory-based CI testing for AI agents.

traceix lets you declare which tools your agent should call — and in what order — as plain YAML, then run those assertions in CI the same way you'd run unit tests. No LLM-as-judge, no flaky eval pipelines: if the trajectory doesn't match, the build fails.

pip install traceix

Why traceix?

LLM-powered agents are non-deterministic. The same prompt might call search_flights then confirm_booking today, but skip straight to confirm_booking tomorrow. traceix makes that observable and enforceable:

  • Declare the expected trajectory in YAML — which tools, in what order, with what args.
  • Run it in CI — the agent still calls the real LLM; only the tool responses are mocked.
  • Get a clear pass/fail — no prompt-engineering an evaluator, no statistical thresholds.

Quick start

pip install traceix
traceix init          # detects your framework, scaffolds traceix.yaml + tests/example.yaml

Write a test (tests/book_flight.yaml):

name: book-flight-basic
input: "Book me the cheapest flight from NYC to SFO"

mocks:
  search_flights:
    return: { flights: [{ id: F1, price: 390 }, { id: F2, price: 420 }] }
  confirm_booking:
    return: { booking_id: BK-001, status: confirmed }

expected:
  trajectory:
    mode: contains        # these steps must appear in order (others allowed)
    steps:
      - tool: search_flights
        args: { origin: NYC, destination: SFO }
        arg_mode: partial  # only check the keys listed above
      - tool: confirm_booking
        arg_mode: ignore
  forbidden_tools: [cancel_booking]

Run it:

traceix run tests/ --handler mypackage.agent:run
  ✓  book-flight-basic   1/1   2 steps   142ms
  ──────────────────────────────────────────────
  1 passed · 0 failed

Integration

tools= parameter (any framework)

Your agent handler accepts a tools list injected by traceix:

# mypackage/agent.py
def run(input: str, tools: list) -> str:
    graph = build_graph(tools)   # rebuild with injected (mocked) tools
    result = graph.invoke({"messages": [HumanMessage(content=input)]})
    return result["messages"][-1].content

@traceix_tool decorator (LangChain / LangGraph)

No handler signature changes needed — just decorate your tools:

from langchain_core.tools import tool
from traceix import traceix_tool

@traceix_tool
@tool
def search_flights(origin: str, destination: str) -> dict:
    """Search available flights."""
    ...  # real implementation

traceix patches the mock in during test runs automatically.


CLI commands

Command What it does
traceix init Detect framework, scaffold traceix.yaml + example test
traceix run tests/ Run tests, exit 0 on pass / 1 on fail
traceix run tests/ --fixture record Save real tool responses to .traceix/fixtures/
traceix run tests/ --fixture replay Replay recorded responses in CI
traceix snapshot tests/ Save golden trajectory baselines
traceix check tests/ Compare live run against saved baselines
traceix compare tests/ --a "model=X" --b "model=Y" A/B test two model configs side by side

Trajectory modes

Mode Meaning
strict Exact tool order and count
contains Listed steps must appear in order (extra steps allowed)
unordered All steps present, any order
within Steps appear as a contiguous block

Arg modes

Mode Meaning
exact Args must match exactly
partial Listed keys must be present with matching values
ignore Args not checked

Framework support

Framework Integration
LangGraph @traceix_tool decorator or tools= injection
CrewAI tools= injection
Anthropic SDK tools= injection
OpenAI SDK tools= injection
Any other tools= injection

Configuration

Set defaults in traceix.yaml (or [tool.traceix] in pyproject.toml):

handler: mypackage.agent:run
runs: 3          # runs per test case (increase in CI for confidence)
tolerance: 0.67  # fraction of runs that must pass
fixture_mode: replay

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

traceix-0.1.0.tar.gz (30.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

traceix-0.1.0-py3-none-any.whl (29.4 kB view details)

Uploaded Python 3

File details

Details for the file traceix-0.1.0.tar.gz.

File metadata

  • Download URL: traceix-0.1.0.tar.gz
  • Upload date:
  • Size: 30.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for traceix-0.1.0.tar.gz
Algorithm Hash digest
SHA256 66f84a1e01abad2ff8f312dceacde82c6b6116e85bbbf74b396f6fdf482ffe05
MD5 793371a807b6a56f19fe3c4f7f5c412f
BLAKE2b-256 decb72cb13b723a5e357a054ae8c84ab9db1252cb6fe88f3a38605c5c6bcd9a0

See more details on using hashes here.

File details

Details for the file traceix-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: traceix-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 29.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for traceix-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c0258eb7be5864cadcd3db7789f44a771aaa2735178afe6104e720c9b44166f9
MD5 45808a6880dacec39bf3dfa01b4f1fb4
BLAKE2b-256 4a08d7c5eb2ecb901f15fabe5475fc73110b4fa4afa7666b02eb92751f1f01e6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page