Skip to main content

Open-source runtime tracing and diagnostics for AI agent execution flows.

Project description

AgentTrace

GitHub Repo Python License Status Local First

Open-source runtime tracing and diagnostics for AI agent execution flows.

AgentTrace helps you understand what your agent actually did at runtime — not just whether the final answer looks good.

It is built for people who want to answer questions like:

  • Why did the agent call this tool twice?
  • Where did the latency actually come from?
  • Which fallback path was triggered?
  • What did the LLM see before it made this decision?
  • Was the execution flow correct, redundant, or suspicious?

If you want something closer to pprof + tracing + agent diagnostics, AgentTrace is designed for that.


Why AgentTrace

Most agent tooling focuses on one of two things:

  • output evaluation — “was the answer good?”
  • framework abstraction — “how do I build the agent?”

AgentTrace focuses on a different question:

What exactly happened during execution, and why did the agent behave that way?

That makes it especially useful for:

  • debugging execution flow
  • diagnosing redundancy and fallback behavior
  • inspecting LLM prompts / responses in context
  • understanding tool usage patterns
  • tracing runtime state across a run

Core capabilities

  • Trace LLM / Tool / Skill execution flows
  • Capture parallel, retry, fallback, and repeated-call patterns
  • Record Prompt / Response / Context / Plan / Execution snapshots
  • Persist runs locally and inspect them in a built-in dashboard
  • Review runs with an LLM after execution
  • Generate structured diagnostics: critical path, recovery chains, redundant calls, suspicious decisions

What you get

Execution tracing

AgentTrace records a runtime trace for each run, including:

  • span type
  • start / end time
  • latency
  • status
  • input parameters
  • grouping and parent-child relationships

Structured state snapshots

For LLM spans, AgentTrace can capture:

  • ContextSnapshot
  • MemorySnapshot
  • PlanSnapshot
  • DecisionSnapshot
  • ResumeSnapshot
  • ExecutionSnapshot

Diagnostics

AgentTrace builds a diagnostics layer on top of the raw trace:

  • critical path
  • failed tool calls
  • recovery chains
  • redundant tool clusters
  • suspicious decisions
  • filtered review findings

LLM review

After each run, AgentTrace can ask an LLM to review the recorded execution flow and flag:

  • redundant tool calls
  • wrong tool choices
  • suspicious fallback behavior
  • unnecessary skill execution
  • likely execution-flow issues

Review strictness is configurable:

  • review_level=1 → tolerant
  • review_level=2 → balanced (default)
  • review_level=3 → strict

At review_level=1/2, the UI hides low severity findings by default. At review_level=3, all findings are shown.


Dashboard

AgentTrace includes a local dashboard at:

  • http://localhost:3500

Current UI features include:

  • session list
  • execution timeline
  • parallel-lane view
  • collapsed repeated-tool clusters
  • prompt / response modal for LLM spans
  • execution-state tabs
  • diagnostics panel
  • LLM review panel
  • collapsible final agent output

Quick start

1. Install from source

git clone https://github.com/happli-sys/AgentTrace.git
cd AgentTrace
pip install -e .

2. Patch once, trace every run

import agenttrace
from my_agent import run

agenttrace.patch(
    "my_agent.tools",
    "my_agent.skills",
    "my_agent.llm",
    llm_modules=["my_agent.llm"],
    skill_modules=["my_agent.skills"],
    review_level=2,
)

output = agenttrace.session("查北京天气并计算 1+2")(run)("查北京天气并计算 1+2")
print(output)
print(agenttrace.last_result().summary())

3. Start the dashboard

from agenttrace.dashboard.server import start_server

start_server(port=3500)

Open:

  • http://localhost:3500

Demo agent

This repo includes a demo agent that intentionally exercises multiple tracing scenarios:

  • bash
  • read
  • grep
  • calculate
  • get_weather
  • flaky_weather
  • weather_report_skill
  • parallel weather queries
  • fallback to stable tools

Run it:

python examples/demo_agent/main.py

Stress prompt:

分析当前目录下的项目;bash pwd;read examples/demo_agent/tools.py;grep calculate examples/demo_agent;查北京和西安的天气,并计算1123123123+1283123;生成北京天气播报;最后总结。

Integration model

AgentTrace works best for:

  • custom Python agents with source code
  • local development environments
  • CLI / hook-based agents
  • runtime debugging and diagnostics workflows

The default integration style is intentionally lightweight:

  • patch modules once
  • wrap runs with session(...)
  • inspect results locally

Project scope

AgentTrace is currently optimized as:

  • a runtime tracing tool
  • a local-first diagnostics tool
  • a developer-facing execution inspector

It is not currently focused on being:

  • a hosted eval platform
  • a benchmark leaderboard
  • a dataset management system
  • a full SaaS observability suite

Who this is for

AgentTrace is especially useful for:

  • engineers building custom agents
  • teams debugging real runtime behavior
  • people who need local-first execution visibility
  • anyone who wants to inspect agent decisions beyond final output quality

Roadmap direction

Current direction is intentionally focused:

  • stronger execution tracing
  • better diagnostics and issue localization
  • cleaner runtime state modeling
  • broader integration patterns for source-based agents
  • more production-friendly export / observability hooks

The goal is to keep AgentTrace useful as a general execution-flow listener, not to turn it into a bloated all-in-one platform too early.


Still useful for objective metrics

Although AgentTrace centers on tracing and diagnostics, it still retains objective runtime metrics such as:

  • total latency
  • avg / p95 step latency
  • tool success rate
  • token usage
  • estimated cost
  • step efficiency
  • correctness (if expected_output is provided)
  • regression tracking
  • comparison helpers

Contributing

Contributions are welcome — especially around:

  • new agent integrations
  • richer diagnostics
  • runtime state capture
  • dashboard usability
  • packaging and release polish

For local development:

git clone https://github.com/happli-sys/AgentTrace.git
cd AgentTrace
pip install -e ".[dev]"
pytest tests/

If you want to contribute, small focused improvements are preferred over large platform-style expansions.


License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agenttrace_runtime-0.1.0.tar.gz (63.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agenttrace_runtime-0.1.0-py3-none-any.whl (73.2 kB view details)

Uploaded Python 3

File details

Details for the file agenttrace_runtime-0.1.0.tar.gz.

File metadata

  • Download URL: agenttrace_runtime-0.1.0.tar.gz
  • Upload date:
  • Size: 63.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for agenttrace_runtime-0.1.0.tar.gz
Algorithm Hash digest
SHA256 1ca10e9eb0d3a1feb18aed468b3e85f0b4de12a39f2bcca8ace03a595413e9e5
MD5 74d1dface626c049b5e4ba230fde47cc
BLAKE2b-256 75fb052ebf0971325c21836ca4a9b71cca63053c773a1638c3553f4234cf5eb4

See more details on using hashes here.

File details

Details for the file agenttrace_runtime-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for agenttrace_runtime-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fe428088dbdd7ba7a3536e8852ca267b03fb50f18ede9f3c34b75e0305b84039
MD5 fc01020d3eba5cb7ddf0e4cd04f7c065
BLAKE2b-256 6ae9cd3472b89dd2c06b08df908d803befe55cf12d3048f268e09c66e0cf577e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page