The open format for AI agent execution traces — assert what your agent DID, not just what it said
Project description
Trajex
The open format for AI agent execution traces.
Assert what your agent did, not just what it said.
The Problem
Every AI agent framework emits execution data in a completely different shape. LangChain traces look nothing like OpenAI Agents traces. CrewAI output is incompatible with LangGraph tooling. A developer who switches frameworks loses all their traces. There is no standard.
This is the exact problem the infrastructure world had before OpenTelemetry.
The Solution
Trajex defines a canonical Trace format — the OpenTelemetry of agent trajectories — with a testing layer on top.
- A spec — the canonical Trace format, versioned like a protocol
- Emitters — one-line integrations for LangChain, OpenAI Agents, CrewAI, Pydantic AI, and any custom agent
- Assertions — behavioral tests that run against any trace, any framework
- CLI — auto-detects failures in a trace file without writing any tests
- Viewer — local HTML trace explorer, no cloud, no login
Here Are Findings
See FINDINGS.md for the field-study findings.
Demo
$ trajex scan tests/fixtures/traces/loop_notification.json --no-color
Trajex v0.2.0 - 5 step(s) . 4 tool call(s) . 'Send a welcome notification to user 99'
FAIL Loop detected: 'send_notification' called 4 times consecutively
'send_notification' was called 4 times in a row.
Consecutive repeated tool calls almost always indicate a logic loop.
Steps involved: [0, 1, 2, 3]
-> fix: no_loop('send_notification', max_calls=1)
WARN 'send_notification' called twice in a row with identical inputs
Steps 0 and 1 are identical calls to 'send_notification'.
Input: {'user_id': 99, 'message': 'Welcome!'}
-> fix: no_loop('send_notification', max_calls=1)
--------------------------------------------------
1 silent failure(s). These pass all current tests and will corrupt production.
Run: trajex init -> generate test file that catches them
Install
pip install trajex
Zero mandatory dependencies. Works offline. No cloud, no API key.
5-Minute Quickstart
With LangChain
from langchain.agents import AgentExecutor
from trajex.emitters.langchain import TrajexCallbackHandler
from trajex import assert_trajectory
from trajex.assertions import sequence, never_before
handler = TrajexCallbackHandler(prompt="Delete account for user 42")
# Pass the handler to your agent
agent.invoke({"input": "Delete account for user 42"}, callbacks=[handler])
trace = handler.build_trace()
assert_trajectory(trace, [
sequence("verify_permissions", "confirm_user", "delete_account"),
never_before("delete_account", "verify_permissions"),
])
With OpenAI Agents SDK
from trajex.emitters.openai import trace_from_openai_run
from trajex import assert_trajectory, scan
from trajex.assertions import tool_called, no_loop
# result = await Runner.run(agent, prompt)
trace = trace_from_openai_run(prompt, result)
scan_report = scan(trace)
print(scan_report.suggested_assertions())
assert_trajectory(trace, [
tool_called("get_weather"),
no_loop("get_weather", max_calls=3),
])
With Raw JSON
from trajex import Trace, assert_trajectory, scan
from trajex.assertions import sequence, max_steps, tool_called
trace = Trace.from_json("trace.json")
# Structural scan — no keywords, no config
report = scan(trace)
print(report.suggested_assertions()) # copy-paste these into your test file
# Behavioral assertions
assert_trajectory(trace, [
sequence("verify_permissions", "delete_account"),
max_steps(10),
tool_called("verify_permissions"),
])
Auto-generate a test file
trajex init trace.json --out tests/test_agent.py
Generates a valid pytest file from scan findings. Review and commit.
Why Not DeepEval / Langfuse?
| Trajex | DeepEval | Langfuse | |
|---|---|---|---|
| Open format / spec | Yes | No | No |
| Works offline | Yes | Partial | No (cloud) |
| Zero dependencies | Yes | No | No |
| Framework-agnostic | Yes | Partial | Yes |
| Behavioral assertions | Yes | LLM-based | No |
| Structural loop detection | Yes | No | No |
| CI-native (exit codes) | Yes | Partial | No |
| No account required | Yes | Yes | No |
Trajex is not a competitor to Langfuse. Langfuse is a SaaS observability product. Trajex is a wire format and testing library — the layer under everything else.
Assertions Reference
sequence(*tools)
Asserts that the given tools were called in this order (gaps allowed).
sequence("verify_permissions", "confirm_user", "delete_account")
Fails if any tool in the sequence is missing or appears out of order.
never_before(tool_a, tool_b)
Asserts that tool_a must never run before tool_b has run.
never_before("delete_account", "verify_permissions")
# delete_account must not run before verify_permissions
Pass cases:
tool_awas never calledtool_bwas called beforetool_a
Fail cases:
tool_acalled buttool_bnever called (silent bypass)tool_acalled at earlier step thantool_b
no_loop(tool, max_calls=1)
Asserts a tool is not called more than max_calls times.
no_loop("send_email", max_calls=1)
no_loop("search", max_calls=3)
Includes scale impact in failure message: 3x calls per user. At 1,000 users: 3,000 invocations.
max_steps(limit)
Asserts the total step count does not exceed limit (counts ALL steps, not just tool calls).
max_steps(15)
tool_called(tool) / tool_never_called(tool)
tool_called("verify_permissions") # must have been called
tool_never_called("drop_table") # must never have been called
Emitters Reference
LangChain — live capture
from trajex.emitters.langchain import TrajexCallbackHandler
handler = TrajexCallbackHandler(prompt="...")
agent.invoke({"input": "..."}, callbacks=[handler])
trace = handler.build_trace()
LangChain — from intermediate_steps
from trajex.emitters.langchain import trace_from_intermediate_steps
result = agent.invoke({"input": "..."}, return_intermediate_steps=True)
trace = trace_from_intermediate_steps(
prompt="...",
steps=result["intermediate_steps"],
output=result["output"],
)
LangGraph
from trajex.emitters.langchain import trace_from_langgraph_result
result = graph.invoke({"messages": [...]})
trace = trace_from_langgraph_result(prompt="...", result=result)
OpenAI Agents SDK
from trajex.emitters.openai import trace_from_openai_run
result = await Runner.run(agent, prompt)
trace = trace_from_openai_run(prompt, result)
OpenAI raw messages
from trajex.emitters.openai import trace_from_openai_messages
trace = trace_from_openai_messages(prompt, messages, final_output=output)
CrewAI
from trajex.emitters.crewai import trace_from_crew_output
output = crew.kickoff(inputs={"prompt": "..."})
trace = trace_from_crew_output(prompt="...", crew_output=output)
Pydantic AI
from trajex.emitters.pydantic_ai import trace_from_pydantic_run
result = await agent.run(prompt)
trace = trace_from_pydantic_run(prompt, result)
Any custom agent
from trajex.emitters.generic import capture_trace, record_tool_call
@record_tool_call
def my_tool(query: str) -> str:
return search(query)
@capture_trace(prompt="my task")
def run_agent(input: str) -> str:
result = my_tool(input)
return result
run_agent("find users")
trace = run_agent.last_trace
Behavioral learning (new in 0.3.0)
Trajex can learn what correct behavior looks like from your passing traces — no rules to write.
import trajex
# Step 1: Learn from your passing traces
baseline = trajex.learn("tests/fixtures/passing_traces/")
# Saved baseline 'baseline-20260418' (ID: a3f1c2b8)
# Step 2: Check new traces against the baseline
from trajex import Trace
trace = Trace.from_json("new_run.json")
findings = trajex.check_anomalies(trace, baseline)
for f in findings:
print(f"[{f.severity}] {f.title}")
print(f" Expected: {f.expected}")
print(f" Observed: {f.observed}")
print(f" Confidence: {f.confidence:.0%}")
[HIGH] New tool appeared: 'drop_database'
Expected: never seen in 47 baseline traces
Observed: called at step 2
Confidence: 100%
[HIGH] Ordering reversal: 'delete_account' before 'confirm_user'
Expected: confirm_user before delete_account (94% of traces)
Observed: delete_account at step 0, confirm_user at step 2
Confidence: 94%
[MEDIUM] 'send_notification' called 4x -- unusually high
Expected: 1.1 +/- 0.3 calls per trace
Observed: 4 calls (9.7 standard deviations above normal)
Confidence: 91%
CLI:
# Learn from a directory of traces
trajex learn tests/fixtures/passing_traces/ --name "my-agent-v2"
# Check a new trace against the baseline
trajex check new_run.json --baseline "my-agent-v2"
# List all saved baselines
trajex baseline list
# Remove a baseline
trajex baseline delete my-agent-v2
Baselines are stored in ~/.trajex/baselines.db (SQLite, stdlib only — zero new dependencies).
Six anomaly checks run automatically:
| Check | Fires when |
|---|---|
new_tool_appeared |
A tool is called that never appeared in baseline traces |
tool_disappeared |
A tool present in 95%+ of baselines is absent |
ordering_violation |
A strong ordering learned from baselines is reversed |
tool_frequency_spike |
A tool is called significantly more than baseline mean |
step_count_anomaly |
Total steps deviate > 2 standard deviations from baseline |
unexpected_first_tool |
First tool called appears as first step in < 5% of baselines |
Real-time interception (LangGraph)
from trajex.guard import TrajexGuardNode
from trajex import BaselineModel
baseline = BaselineModel.load("my-agent-v2")
guard = TrajexGuardNode(
baseline=baseline,
tools=[search_tool, write_tool, commit_tool],
on_anomaly="interrupt", # pause for human review
)
# Drop into your LangGraph graph
graph.add_node("tools", guard)
graph.add_edge("agent", "tools")
When an anomaly is detected before tool execution:
"interrupt"— pauses the graph, waits for human approval vialanggraph.types.interrupt"warn"— adds warning to state undertrajex_warnings, continues running"block"— raisesValueError, stops execution immediately
Requires pip install trajex[langchain]. The guard module fails gracefully with a clear
ImportError message when LangGraph is not installed.
CLI Reference
trajex scan <trace.json> [--schema schema.json] [--no-color]
Scans for structural and behavioral anomalies. Exits 1 if failures found.
trajex init <trace.json> [--out test_agent.py]
Generates a pytest test file from scan findings.
trajex view <trace.json>
Opens a self-contained HTML trace viewer in your browser. No server. No login.
trajex check <trace.json> [--schema schema.json]
CI mode — silent scan, exits 1 on failures.
trajex info <trace.json>
Prints trace summary (ID, prompt, steps, tools, duration, framework, model).
Schema file (for name-aware checks)
{
"destructive_tools": ["delete_user", "drop_table"],
"guard_tools": ["confirm_action", "verify_permissions"],
"financial_tools": ["charge_card", "transfer_funds"],
"notification_tools": ["send_email", "send_sms"]
}
Without a schema, the scanner uses structural analysis only (no keyword guessing).
The Trace Format
Trajex defines a versioned, open trace format. Any framework can emit it. Any tool can consume it.
See spec/TRACE_FORMAT.md for the full specification.
{
"trajex_version": "1",
"id": "550e8400-e29b-41d4-a716-446655440000",
"prompt": "Delete account for user 42",
"status": "success",
"steps": [
{
"index": 0,
"step_type": "tool_call",
"name": "verify_permissions",
"input": {"user_id": 42},
"output": {"allowed": true}
}
]
}
Contributing
Adding an emitter for a new framework
- Create
trajex/emitters/<framework>.py - Add an import guard at the top (
try: import framework; _AVAILABLE = True) - Implement a
trace_from_<framework>_result(prompt, result) -> Tracefunction - Map framework-specific objects to
Stepobjects with appropriatestep_type - Set
metadata["framework"]to your framework name - Add tests in
tests/test_emitters.py - Add an example in
examples/ - Update this README's Emitters Reference section
The key rule: tool_call steps are what assertions operate on. Make sure your emitter maps the framework's tool calls to StepType.TOOL_CALL.
Running tests
pip install -e ".[dev]"
pytest tests/ -v
License
MIT — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file trajex-0.3.0.tar.gz.
File metadata
- Download URL: trajex-0.3.0.tar.gz
- Upload date:
- Size: 68.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.12.1.2 readme-renderer/44.0 requests/2.32.5 requests-toolbelt/1.0.0 urllib3/2.5.0 tqdm/4.67.1 importlib-metadata/8.6.1 keyring/25.6.0 rfc3986/1.5.0 colorama/0.4.6 CPython/3.11.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
866a1f5215df810c2238c537204f337e0477138a2212699bb3ab4d73e77dcb26
|
|
| MD5 |
f4c2720415bd6f01e8aaf20f03899b5e
|
|
| BLAKE2b-256 |
f2e8163232b906cd6b204f2e64b5fae365498e3092185cd5420044f0e6b51cf5
|
File details
Details for the file trajex-0.3.0-py3-none-any.whl.
File metadata
- Download URL: trajex-0.3.0-py3-none-any.whl
- Upload date:
- Size: 57.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.12.1.2 readme-renderer/44.0 requests/2.32.5 requests-toolbelt/1.0.0 urllib3/2.5.0 tqdm/4.67.1 importlib-metadata/8.6.1 keyring/25.6.0 rfc3986/1.5.0 colorama/0.4.6 CPython/3.11.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
92f99302a0e1bf95b1e8b72f56c4d746a3d29b19a8e15bf2b8f8f6ec8c4f920b
|
|
| MD5 |
d75645a1ddbcc5e9df93a6008a686645
|
|
| BLAKE2b-256 |
568535cf167643253267bd92af6883204ebd838a11269bdd966cbf9434682f54
|