Skip to main content

Chrome DevTools for AI agents. Debug prompts, tool calls, memory, and reasoning with a visual execution timeline.

Project description

TraceAI logo

TraceAI

Chrome DevTools for AI agents.
Trace prompts, tool calls, memory, and reasoning in a visual execution timeline — locally, with zero infrastructure.

PyPI version Python versions License CI mypy


The Problem

AI agents are quickly becoming complex software systems — but developers still debug them as black boxes.

Traditional debugging tools show stack traces, variable states, and execution flow. AI systems add entirely new layers that those tools cannot inspect:

  • Dynamic prompts that change at runtime
  • Model reasoning steps that are opaque by nature
  • Tool calls with side effects scattered across the trace
  • Memory retrieval from vector stores and caches
  • Multi-agent coordination across async tasks

When something goes wrong, the typical debugging session looks like this:

print(prompt)
print(response)

The equivalent of debugging a distributed system with printf statements.

As AI agents grow more complex, this becomes nearly impossible. Developers cannot easily answer basic questions:

What exact prompt did the model receive? Why did the agent choose this tool? Which memory entries were retrieved? What changed between two runs?

TraceAI is the missing primitive — a universal debugging layer for AI agents that brings DevTools-grade observability to your local development environment.


What is TraceAI?

TraceAI automatically captures every LLM call, tool invocation, memory access, and agent decision in a structured, searchable, visual timeline — stored locally in SQLite with no cloud dependency.

Your Agent
   │
   ├── 🤖 llm_call  [gpt-4o]     125ms   $0.0012   ← prompt, response, tokens
   ├── 🔧 tool_call [search]      340ms             ← args, result
   ├── 📖 memory_read             12ms              ← query, retrieved docs
   └── 🤖 llm_call  [gpt-4o]     98ms    $0.0009   ← final synthesis

Open traceai open and every run above becomes a clickable, inspectable, replayable trace.


How TraceAI is Different

Most observability tools are built for production monitoring and require infrastructure, cloud accounts, and team setup. TraceAI is built for the individual developer who needs to understand what their agent just did.

Feature TraceAI LangSmith Langfuse Arize Phoenix Braintrust
pip install → working in 30 seconds
Zero cloud / no account required
Zero infrastructure (no Docker)
Framework-agnostic ⚠️ ⚠️
Model replay & arbitrage built-in
Side-by-side trace comparison
Multi-provider experiments
Streaming capture
Head sampling
Fully open source ⚠️
Local-first (data never leaves your machine) ⚠️ ⚠️

⚠️ = partial / requires self-hosting


Features

Auto-Instrumentation

Drop two lines into any existing codebase. TraceAI patches the OpenAI and Anthropic SDKs transparently — no changes to your agent code.

Visual Dashboard

A React + TypeScript dashboard served from traceai open. Browse traces, inspect span trees, view raw inputs/outputs, and navigate the full execution timeline.

Model Replay & Arbitrage

Re-run any LLM span with a different model. Compare cost, token usage, and output quality side-by-side. Answer "should I switch from GPT-4o to Claude Haiku?" with real data from your own workload.

Experiments

Group multiple runs into a named experiment with traceai.experiment("name"). Compare any two traces from the same experiment in the dashboard — even across providers.

Streaming Support

stream=True calls are fully captured. Token counts and content are aggregated from chunks and recorded exactly as non-streaming calls.

Head Sampling

Control trace volume with traceai.configure(sample_rate=0.1). Sampled-out traces are silent no-ops — zero overhead from span creation or DB writes.

OTel-Compatible Metadata

Spans carry standard OpenTelemetry GenAI semantic conventions: gen_ai.system, gen_ai.request.model, gen_ai.usage.input_tokens, gen_ai.response.finish_reason, and more.

CLI

Inspect traces without opening a browser: traceai list, traceai inspect <id>, traceai export <id>.


Quick Start

pip install traceai-sdk

Auto-instrument OpenAI in 2 lines

import traceai
traceai.instrument("openai")

# Your existing code — unchanged
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Explain gradient descent in two sentences."}],
)
print(response.choices[0].message.content)
traceai open   # launches dashboard at http://localhost:7474

Every call is captured automatically: model, messages, response, token counts, cost, duration, finish reason.

Auto-instrument Anthropic

import traceai
traceai.instrument("anthropic")

import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=256,
    messages=[{"role": "user", "content": "Explain gradient descent in two sentences."}],
)
print(message.content[0].text)

Manual instrumentation

Use @tracer.trace and tracer.span() to instrument any Python code — no LLM framework required:

from traceai import tracer

@tracer.trace
def my_agent(question: str) -> str:
    with tracer.span("retrieve-context", kind="retrieval") as span:
        span.set_input({"query": question})
        docs = vector_store.search(question)
        span.set_output({"doc_count": len(docs)})

    with tracer.span("llm-synthesis", kind="llm_call") as span:
        prompt = build_prompt(question, docs)
        span.set_input({"messages": [{"role": "user", "content": prompt}]})
        answer = call_llm(prompt)
        span.set_output({"content": answer})

    return answer

Streaming

Streaming calls are captured transparently — content is buffered from delta events and token counts are extracted from the final chunk:

import traceai
traceai.instrument("openai")

from openai import OpenAI
client = OpenAI()

# stream=True is fully supported — nothing extra needed
stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Count from 1 to 5."}],
    stream=True,
)
for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="", flush=True)
print()

The trace will show the full aggregated response, token usage, and gen_ai.streaming: true in metadata.

Async streaming works identically:

import asyncio, traceai
traceai.instrument("openai")

from openai import AsyncOpenAI
client = AsyncOpenAI()

async def main():
    stream = await client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Count from 1 to 5."}],
        stream=True,
    )
    async for chunk in stream:
        print(chunk.choices[0].delta.content or "", end="", flush=True)
    print()

asyncio.run(main())

Sampling

Control trace volume without changing your agent code:

import traceai

# Capture only 10% of traces globally
traceai.configure(sample_rate=0.1)

# Or set per-function (overrides global rate)
@traceai.tracer.trace(name="high-volume-agent", sample_rate=0.05)
def process_request(text: str) -> str:
    ...

Sampled-out traces are silent no-ops — the function still executes and returns its value normally; TraceAI simply does not record anything. There is zero overhead from span creation or DB writes.


Experiments & Model Arbitrage

Group multiple runs into a named experiment, then compare them side-by-side in the dashboard:

import traceai

traceai.instrument("openai")
traceai.instrument("anthropic")

import openai, anthropic

PROMPT = "Summarise the Python GIL in one sentence."

with traceai.experiment("python-gil-summary"):
    # Run the same task on two providers
    openai.OpenAI().chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": PROMPT}],
    )
    anthropic.Anthropic().messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=128,
        messages=[{"role": "user", "content": PROMPT}],
    )
traceai open
# → both traces show ⇄ python-gil-summary badge
# → select both → Compare → side-by-side diff with cost breakdown

This is the model arbitrage pattern: identical task, different providers, instant cost and quality comparison from your own workload data.

Replay any trace with a different model

In the dashboard, select any trace → click ↺ Replay All LLM Calls → pick a target model. TraceAI re-runs every llm_call span with the new model and saves the result as a linked trace. The comparison banner shows:

↺ Replayed  gpt-4o → claude-haiku-4-5-20251001
Original: $0.042 · 1,200 tok     Replay: $0.003 · 1,100 tok
Cost savings: 93%  ↓   Token delta: −8%

Dashboard

traceai open            # opens http://localhost:7474
traceai open --port 8080 --no-browser   # headless / custom port

The dashboard provides:

  • Trace list — browse all traces with duration, cost, token count, status
  • Span tree — visual waterfall showing parent/child span relationships
  • Span detail — full inputs, outputs, metadata, and error details for any span
  • Compare view — side-by-side diff of any two traces from the same experiment or replay family
  • Replay panel — re-run spans or entire traces with a different model
  • Experiments tab — aggregate stats (total cost, tokens, run count) per experiment

CLI Reference

traceai list [--limit N] [--status ok|error|pending] [--db PATH]

List recent traces in a table. Columns: ID, name, status, duration, tokens, cost, timestamp.

traceai inspect <TRACE_ID> [--db PATH]

Render the full span tree for a trace, with inputs/outputs for each node.

traceai export <TRACE_ID> [--db PATH]

Print the trace and all its spans as JSON (pipe-friendly).

traceai delete <TRACE_ID> [--db PATH]

Delete a trace and all its spans.

traceai open [--port PORT] [--host HOST] [--no-browser] [--db PATH]

Start the dashboard server and open a browser tab.

traceai config show
traceai config set <KEY> <VALUE>
traceai config get <KEY>

Manage configuration stored at ~/.traceai/config.toml.


Architecture

┌─────────────────────────────────────────────────────────┐
│                       Your Agent                        │
│                                                         │
│   @tracer.trace    tracer.span()    instrument("openai") │
└────────────────────┬────────────────────────────────────┘
                     │ async-safe ContextVar propagation
                     ▼
┌─────────────────────────────────────────────────────────┐
│                     TraceAI Core                        │
│                                                         │
│   Tracer  ──►  Span  ──►  Trace  ──►  TraceStore       │
│   (sampling, experiments, context)   (aiosqlite WAL)   │
└─────────────────────────────────────────────────────────┘
                     │
                     ▼
┌─────────────────────────────────────────────────────────┐
│              FastAPI Server + React Dashboard           │
│                                                         │
│   REST API  ──►  SQLite  ──►  Vite/React/Tailwind UI   │
│   /api/traces   /api/spans  /api/experiments            │
└─────────────────────────────────────────────────────────┘

Storage: SQLite with WAL mode at ~/.traceai/traces.db. All data stays on your machine.

Async safety: ContextVar-based propagation ensures traces are correctly nested even when using asyncio.gather() or concurrent tasks.

Provider integrations: Thin monkey-patch wrappers over the official SDKs. No forking, no subclassing — just __wrapped__ attribute preservation and transparent proxy objects for streaming.


Span Kinds

Kind Description
llm_call A call to a language model
tool_call A tool or function invocation
agent_step A single reasoning step in an agent loop
retrieval A vector store or search query
memory_read Reading from a memory store
memory_write Writing to a memory store
embedding An embedding generation call
custom Anything else

Installation

# Core (tracer + dashboard + CLI)
pip install traceai-sdk

# With OpenAI auto-instrumentation
pip install "traceai-sdk[openai]"

# With Anthropic auto-instrumentation
pip install "traceai-sdk[anthropic]"

# Both providers
pip install "traceai-sdk[openai,anthropic]"

# Development (tests, linting, type checking)
pip install "traceai-sdk[dev]"

Requires Python 3.11+.


Requirements

  • Python 3.11 or higher
  • No cloud account, no Docker, no server setup
  • OpenAI and/or Anthropic keys only needed for auto-instrumented provider calls

Contributing

Contributions are welcome. See CONTRIBUTING.md for development setup, coding standards, and the PR process.

Bug reports and feature requests: open an issue.


Security

To report a vulnerability, see SECURITY.md.


License

MIT — free to use, modify, and distribute.


Built with the belief that AI systems deserve the same debugging experience as traditional software.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

traceai_sdk-0.5.0.tar.gz (205.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

traceai_sdk-0.5.0-py3-none-any.whl (114.3 kB view details)

Uploaded Python 3

File details

Details for the file traceai_sdk-0.5.0.tar.gz.

File metadata

  • Download URL: traceai_sdk-0.5.0.tar.gz
  • Upload date:
  • Size: 205.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for traceai_sdk-0.5.0.tar.gz
Algorithm Hash digest
SHA256 521f010dd95980aa007e7df8ac5cae54fc865af74a895bedb412cc5b989b08ee
MD5 dd3a35544cb2909a3fe844b5fb3ed798
BLAKE2b-256 2e8eac0bf53134a035d17dab134bdeb2773d9eb731ab209407f446fc4fa3588a

See more details on using hashes here.

Provenance

The following attestation bundles were made for traceai_sdk-0.5.0.tar.gz:

Publisher: release.yml on arnavvj/traceai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file traceai_sdk-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: traceai_sdk-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 114.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for traceai_sdk-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8361d1100595ee38146c6f13aaaae10ddcf63ed55557cd245dba7794c640a243
MD5 f5fa21ad145f3adcbb0342c047c7ac06
BLAKE2b-256 99a87f77a3b384740db4d97e8dd5ddcf1cfabd82d549765f53716d62615bb4b4

See more details on using hashes here.

Provenance

The following attestation bundles were made for traceai_sdk-0.5.0-py3-none-any.whl:

Publisher: release.yml on arnavvj/traceai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page