The open format for AI agent execution traces — assert what your agent DID, not just what it said

These details have not been verified by PyPI

Project links

Project description

Trajex

The open format for AI agent execution traces.

Assert what your agent did, not just what it said.

The Problem

Every AI agent framework emits execution data in a completely different shape. LangChain traces look nothing like OpenAI Agents traces. CrewAI output is incompatible with LangGraph tooling. A developer who switches frameworks loses all their traces. There is no standard.

This is the exact problem the infrastructure world had before OpenTelemetry.

The Solution

Trajex defines a canonical Trace format — the OpenTelemetry of agent trajectories — with a testing layer on top.

A spec — the canonical Trace format, versioned like a protocol
Emitters — one-line integrations for LangChain, OpenAI Agents, CrewAI, Pydantic AI, and any custom agent
Assertions — behavioral tests that run against any trace, any framework
CLI — auto-detects failures in a trace file without writing any tests
Viewer — local HTML trace explorer, no cloud, no login

Here Are Findings

See FINDINGS.md for the field-study findings.

Demo

$ trajex scan tests/fixtures/traces/loop_notification.json --no-color

Trajex v0.2.0  -  5 step(s) . 4 tool call(s) . 'Send a welcome notification to user 99'

  FAIL  Loop detected: 'send_notification' called 4 times consecutively
        'send_notification' was called 4 times in a row.
        Consecutive repeated tool calls almost always indicate a logic loop.
        Steps involved: [0, 1, 2, 3]
        -> fix: no_loop('send_notification', max_calls=1)

  WARN  'send_notification' called twice in a row with identical inputs
        Steps 0 and 1 are identical calls to 'send_notification'.
        Input: {'user_id': 99, 'message': 'Welcome!'}
        -> fix: no_loop('send_notification', max_calls=1)

  --------------------------------------------------
  1 silent failure(s). These pass all current tests and will corrupt production.
  Run: trajex init  ->  generate test file that catches them

Install

pip install trajex

Zero mandatory dependencies. Works offline. No cloud, no API key.

5-Minute Quickstart

With LangChain

from langchain.agents import AgentExecutor
from trajex.emitters.langchain import TrajexCallbackHandler
from trajex import assert_trajectory
from trajex.assertions import sequence, never_before

handler = TrajexCallbackHandler(prompt="Delete account for user 42")

# Pass the handler to your agent
agent.invoke({"input": "Delete account for user 42"}, callbacks=[handler])

trace = handler.build_trace()

assert_trajectory(trace, [
    sequence("verify_permissions", "confirm_user", "delete_account"),
    never_before("delete_account", "verify_permissions"),
])

With OpenAI Agents SDK

from trajex.emitters.openai import trace_from_openai_run
from trajex import assert_trajectory, scan
from trajex.assertions import tool_called, no_loop

# result = await Runner.run(agent, prompt)
trace = trace_from_openai_run(prompt, result)

scan_report = scan(trace)
print(scan_report.suggested_assertions())

assert_trajectory(trace, [
    tool_called("get_weather"),
    no_loop("get_weather", max_calls=3),
])

With Raw JSON

from trajex import Trace, assert_trajectory, scan
from trajex.assertions import sequence, max_steps, tool_called

trace = Trace.from_json("trace.json")

# Structural scan — no keywords, no config
report = scan(trace)
print(report.suggested_assertions())  # copy-paste these into your test file

# Behavioral assertions
assert_trajectory(trace, [
    sequence("verify_permissions", "delete_account"),
    max_steps(10),
    tool_called("verify_permissions"),
])

Auto-generate a test file

trajex init trace.json --out tests/test_agent.py

Generates a valid pytest file from scan findings. Review and commit.

Why Not DeepEval / Langfuse?

	Trajex	DeepEval	Langfuse
Open format / spec	Yes	No	No
Works offline	Yes	Partial	No (cloud)
Zero dependencies	Yes	No	No
Framework-agnostic	Yes	Partial	Yes
Behavioral assertions	Yes	LLM-based	No
Structural loop detection	Yes	No	No
CI-native (exit codes)	Yes	Partial	No
No account required	Yes	Yes	No

Trajex is not a competitor to Langfuse. Langfuse is a SaaS observability product. Trajex is a wire format and testing library — the layer under everything else.

Assertions Reference

`sequence(*tools)`

Asserts that the given tools were called in this order (gaps allowed).

sequence("verify_permissions", "confirm_user", "delete_account")

Fails if any tool in the sequence is missing or appears out of order.

`never_before(tool_a, tool_b)`

Asserts that tool_a must never run before tool_b has run.

never_before("delete_account", "verify_permissions")
# delete_account must not run before verify_permissions

Pass cases:

tool_a was never called
tool_b was called before tool_a

Fail cases:

tool_a called but tool_b never called (silent bypass)
tool_a called at earlier step than tool_b

`no_loop(tool, max_calls=1)`

Asserts a tool is not called more than max_calls times.

no_loop("send_email", max_calls=1)
no_loop("search", max_calls=3)

Includes scale impact in failure message: 3x calls per user. At 1,000 users: 3,000 invocations.

`max_steps(limit)`

Asserts the total step count does not exceed limit (counts ALL steps, not just tool calls).

max_steps(15)

`tool_called(tool)` / `tool_never_called(tool)`

tool_called("verify_permissions")       # must have been called
tool_never_called("drop_table")         # must never have been called

Emitters Reference

LangChain — live capture

from trajex.emitters.langchain import TrajexCallbackHandler

handler = TrajexCallbackHandler(prompt="...")
agent.invoke({"input": "..."}, callbacks=[handler])
trace = handler.build_trace()

LangChain — from intermediate_steps

from trajex.emitters.langchain import trace_from_intermediate_steps

result = agent.invoke({"input": "..."}, return_intermediate_steps=True)
trace = trace_from_intermediate_steps(
    prompt="...",
    steps=result["intermediate_steps"],
    output=result["output"],
)

LangGraph

from trajex.emitters.langchain import trace_from_langgraph_result

result = graph.invoke({"messages": [...]})
trace = trace_from_langgraph_result(prompt="...", result=result)

OpenAI Agents SDK

from trajex.emitters.openai import trace_from_openai_run

result = await Runner.run(agent, prompt)
trace = trace_from_openai_run(prompt, result)

OpenAI raw messages

from trajex.emitters.openai import trace_from_openai_messages

trace = trace_from_openai_messages(prompt, messages, final_output=output)

CrewAI

from trajex.emitters.crewai import trace_from_crew_output

output = crew.kickoff(inputs={"prompt": "..."})
trace = trace_from_crew_output(prompt="...", crew_output=output)

Pydantic AI

from trajex.emitters.pydantic_ai import trace_from_pydantic_run

result = await agent.run(prompt)
trace = trace_from_pydantic_run(prompt, result)

Any custom agent

from trajex.emitters.generic import capture_trace, record_tool_call

@record_tool_call
def my_tool(query: str) -> str:
    return search(query)

@capture_trace(prompt="my task")
def run_agent(input: str) -> str:
    result = my_tool(input)
    return result

run_agent("find users")
trace = run_agent.last_trace

Behavioral learning (new in 0.3.0)

Trajex can learn what correct behavior looks like from your passing traces — no rules to write.

import trajex

# Step 1: Learn from your passing traces
baseline = trajex.learn("tests/fixtures/passing_traces/")
# Saved baseline 'baseline-20260418' (ID: a3f1c2b8)

# Step 2: Check new traces against the baseline
from trajex import Trace
trace = Trace.from_json("new_run.json")
findings = trajex.check_anomalies(trace, baseline)

for f in findings:
    print(f"[{f.severity}] {f.title}")
    print(f"  Expected: {f.expected}")
    print(f"  Observed: {f.observed}")
    print(f"  Confidence: {f.confidence:.0%}")

[HIGH]   New tool appeared: 'drop_database'
         Expected: never seen in 47 baseline traces
         Observed: called at step 2
         Confidence: 100%

[HIGH]   Ordering reversal: 'delete_account' before 'confirm_user'
         Expected: confirm_user before delete_account (94% of traces)
         Observed: delete_account at step 0, confirm_user at step 2
         Confidence: 94%

[MEDIUM] 'send_notification' called 4x -- unusually high
         Expected: 1.1 +/- 0.3 calls per trace
         Observed: 4 calls (9.7 standard deviations above normal)
         Confidence: 91%

CLI:

# Learn from a directory of traces
trajex learn tests/fixtures/passing_traces/ --name "my-agent-v2"

# Check a new trace against the baseline
trajex check new_run.json --baseline "my-agent-v2"

# List all saved baselines
trajex baseline list

# Remove a baseline
trajex baseline delete my-agent-v2

Baselines are stored in ~/.trajex/baselines.db (SQLite, stdlib only — zero new dependencies).

Six anomaly checks run automatically:

Check	Fires when
`new_tool_appeared`	A tool is called that never appeared in baseline traces
`tool_disappeared`	A tool present in 95%+ of baselines is absent
`ordering_violation`	A strong ordering learned from baselines is reversed
`tool_frequency_spike`	A tool is called significantly more than baseline mean
`step_count_anomaly`	Total steps deviate > 2 standard deviations from baseline
`unexpected_first_tool`	First tool called appears as first step in < 5% of baselines

Real-time interception (LangGraph)

from trajex.guard import TrajexGuardNode
from trajex import BaselineModel

baseline = BaselineModel.load("my-agent-v2")
guard = TrajexGuardNode(
    baseline=baseline,
    tools=[search_tool, write_tool, commit_tool],
    on_anomaly="interrupt",   # pause for human review
)

# Drop into your LangGraph graph
graph.add_node("tools", guard)
graph.add_edge("agent", "tools")

When an anomaly is detected before tool execution:

"interrupt" — pauses the graph, waits for human approval via langgraph.types.interrupt
"warn" — adds warning to state under trajex_warnings, continues running
"block" — raises ValueError, stops execution immediately

Requires pip install trajex[langchain]. The guard module fails gracefully with a clear ImportError message when LangGraph is not installed.

CLI Reference

trajex scan  <trace.json> [--schema schema.json] [--no-color]

Scans for structural and behavioral anomalies. Exits 1 if failures found.

trajex init  <trace.json> [--out test_agent.py]

Generates a pytest test file from scan findings.

trajex view  <trace.json>

Opens a self-contained HTML trace viewer in your browser. No server. No login.

trajex check <trace.json> [--schema schema.json]

CI mode — silent scan, exits 1 on failures.

trajex info  <trace.json>

Prints trace summary (ID, prompt, steps, tools, duration, framework, model).

Schema file (for name-aware checks)

{
  "destructive_tools": ["delete_user", "drop_table"],
  "guard_tools": ["confirm_action", "verify_permissions"],
  "financial_tools": ["charge_card", "transfer_funds"],
  "notification_tools": ["send_email", "send_sms"]
}

Without a schema, the scanner uses structural analysis only (no keyword guessing).

The Trace Format

Trajex defines a versioned, open trace format. Any framework can emit it. Any tool can consume it.

See spec/TRACE_FORMAT.md for the full specification.

{
  "trajex_version": "1",
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "prompt": "Delete account for user 42",
  "status": "success",
  "steps": [
    {
      "index": 0,
      "step_type": "tool_call",
      "name": "verify_permissions",
      "input": {"user_id": 42},
      "output": {"allowed": true}
    }
  ]
}

Contributing

Adding an emitter for a new framework

Create trajex/emitters/<framework>.py
Add an import guard at the top (try: import framework; _AVAILABLE = True)
Implement a trace_from_<framework>_result(prompt, result) -> Trace function
Map framework-specific objects to Step objects with appropriate step_type
Set metadata["framework"] to your framework name
Add tests in tests/test_emitters.py
Add an example in examples/
Update this README's Emitters Reference section

The key rule: tool_call steps are what assertions operate on. Make sure your emitter maps the framework's tool calls to StepType.TOOL_CALL.

Running tests

pip install -e ".[dev]"
pytest tests/ -v

License

MIT — see LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.3.0

Apr 18, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

trajex-0.3.0.tar.gz (68.2 kB view details)

Uploaded Apr 18, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

trajex-0.3.0-py3-none-any.whl (57.1 kB view details)

Uploaded Apr 18, 2026 Python 3

File details

Details for the file trajex-0.3.0.tar.gz.

File metadata

Download URL: trajex-0.3.0.tar.gz
Upload date: Apr 18, 2026
Size: 68.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.8.0 pkginfo/1.12.1.2 readme-renderer/44.0 requests/2.32.5 requests-toolbelt/1.0.0 urllib3/2.5.0 tqdm/4.67.1 importlib-metadata/8.6.1 keyring/25.6.0 rfc3986/1.5.0 colorama/0.4.6 CPython/3.11.3

File hashes

Hashes for trajex-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`866a1f5215df810c2238c537204f337e0477138a2212699bb3ab4d73e77dcb26`
MD5	`f4c2720415bd6f01e8aaf20f03899b5e`
BLAKE2b-256	`f2e8163232b906cd6b204f2e64b5fae365498e3092185cd5420044f0e6b51cf5`

See more details on using hashes here.

File details

Details for the file trajex-0.3.0-py3-none-any.whl.

File metadata

Download URL: trajex-0.3.0-py3-none-any.whl
Upload date: Apr 18, 2026
Size: 57.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.8.0 pkginfo/1.12.1.2 readme-renderer/44.0 requests/2.32.5 requests-toolbelt/1.0.0 urllib3/2.5.0 tqdm/4.67.1 importlib-metadata/8.6.1 keyring/25.6.0 rfc3986/1.5.0 colorama/0.4.6 CPython/3.11.3

File hashes

Hashes for trajex-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`92f99302a0e1bf95b1e8b72f56c4d746a3d29b19a8e15bf2b8f8f6ec8c4f920b`
MD5	`d75645a1ddbcc5e9df93a6008a686645`
BLAKE2b-256	`568535cf167643253267bd92af6883204ebd838a11269bdd966cbf9434682f54`

See more details on using hashes here.

trajex 0.3.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Trajex

The Problem

The Solution

Here Are Findings

Demo

Install

5-Minute Quickstart

With LangChain

With OpenAI Agents SDK

With Raw JSON

Auto-generate a test file

Why Not DeepEval / Langfuse?

Assertions Reference

sequence(*tools)

never_before(tool_a, tool_b)

no_loop(tool, max_calls=1)

max_steps(limit)

tool_called(tool) / tool_never_called(tool)

Emitters Reference

LangChain — live capture

LangChain — from intermediate_steps

LangGraph

OpenAI Agents SDK

OpenAI raw messages

CrewAI

Pydantic AI

Any custom agent

Behavioral learning (new in 0.3.0)

Real-time interception (LangGraph)

CLI Reference

Schema file (for name-aware checks)

The Trace Format

Contributing

Adding an emitter for a new framework

Running tests

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`sequence(*tools)`

`never_before(tool_a, tool_b)`

`no_loop(tool, max_calls=1)`

`max_steps(limit)`

`tool_called(tool)` / `tool_never_called(tool)`