Local-first LLM agent observability — decorator + OpenTelemetry ingestion, traces grouped by execution pattern, cost per call, search, replay, shareable HTML snapshots.

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

harrywinter06

These details have not been verified by PyPI

Project description

clustertrace

A local-first debugger for LLM agents. Drop in a decorator, run your agent a few hundred times, then look at where it's failing by execution pattern rather than by individual trace.

I built it because I had a multi-step agent that was failing about 20% of the time in random-looking ways, and the only trace-tooling I had was a flat list view that didn't help. Clustering the traces by execution signature turned it from "scroll 80 stack traces" into "two patterns explain ten of the twelve failures, here they are". That's the entire pitch.

clustertrace clusters page

In the bundled 60-trace demo, two clusters cover 10 of the 12 failures (83%). Eighteen distinct execution patterns total. You can reproduce the numbers in 30 seconds.

Try it without an API key

pip install clustertrace
clustertrace demo

Loads 60 pre-recorded traces of three agents (research, RAG, tool-use), boots a dashboard on localhost:7777, no API spend. (Pre-PyPI install: pip install "clustertrace @ git+<repo-url>" once the repo's pushed; both URLs go live on first public push.)

If clustertrace demo works on your machine, the rest of this README is just feature surface.

Using it on a real agent

There are three ways to feed traces in. Pick one.

A bare decorator on the functions you want traced:

import clustertrace

@clustertrace.trace(tags={"agent": "research"})
async def plan(query): ...

with clustertrace.span("retrieval", k=5):
    ...

clustertrace.tool_call("web_search", args={"q": query}, result=hits)
clustertrace.tag("user_tier", "pro")
clustertrace.metric("score", 0.85)        # numeric, aggregated to a sparkline

Async-safe. Concurrent asyncio.gather calls produce separate traces; nesting tracks the parent via contextvars.

Or wrap your SDK client, no decorator on your code:

from anthropic import Anthropic, AnthropicBedrock, AnthropicVertex
from openai import OpenAI
import clustertrace

client  = clustertrace.wrap_anthropic(Anthropic())          # direct API
bedrock = clustertrace.wrap_anthropic(AnthropicBedrock())   # AWS Bedrock
vertex  = clustertrace.wrap_anthropic(AnthropicVertex())    # Google Vertex
oai     = clustertrace.wrap_openai(OpenAI())                # OpenAI

It's an explicit wrap, no global monkey-patching. Async clients (AsyncAnthropic, AsyncOpenAI) are detected automatically.

Or just point your existing OpenTelemetry setup at clustertrace as an exporter:

from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from clustertrace.otel import ClustertraceSpanExporter

provider = TracerProvider()
provider.add_span_processor(BatchSpanProcessor(ClustertraceSpanExporter()))

If you already have OTel wired through LangChain, LlamaIndex, Bedrock auto-instrumentation, or your own custom spans, this picks them up. gen_ai.* and llm.* attribute conventions are mapped onto clustertrace's schema, so cost and clustering work on OTel-sourced traces too.

What the dashboard shows

page	content
`/`	filterable trace list with status/tag/name filters, live polling, per-trace cost
`/clusters`	distinct execution patterns: count, failure rate, sample trace, longest common failure prefix, top failing nodes
`/search`	FTS5 search across span name, input, output, error message; supports phrases, OR, NEAR
`/metrics`	per-metric aggregates and rolling sparklines for whatever you passed to `clustertrace.metric()`
`/failures`	per-span error-rate bars, step-of-failure histogram, force-directed call graph
`/trace/<id>`	Gantt timeline, expandable I/O, tags, metrics, per-span cost

examples/sample-trace.html shows a 16 KB self-contained shareable snapshot: data and renderer embedded in one file, no external assets.

CLI

clustertrace demo                                      # one-step trial with bundled data
clustertrace dashboard                                 # launch local server
clustertrace stats                                     # one-screen DB summary
clustertrace inspect --latest                          # terminal Gantt of last trace
clustertrace inspect --failed                          # most-recent failed trace
clustertrace inspect <trace_id> --expand <span_id>     # dump that span's I/O
clustertrace mcp                                       # MCP server for AI editors (stdio)
clustertrace mcp install --target claude-code          # wire into your editor
clustertrace backfill-cost                             # compute $ for every LLM call retroactively
clustertrace backfill-signatures                       # signatures for older traces
clustertrace snapshot <trace_id> -o trace.html         # self-contained shareable HTML
clustertrace export <trace_id>                         # JSONL to stdout
clustertrace export --all > backup.jsonl               # everything
clustertrace import < backup.jsonl                     # merge (skips existing IDs)
clustertrace replay <trace_id> --entry mod:fn          # re-run with captured args
clustertrace db-path                                   # print SQLite path

Use with Claude Code

Pipe Claude Code's OpenTelemetry traces into clustertrace and you get cluster, failure-pattern, and cost analysis of your actual Claude Code usage. The dashboard ships both an OTLP/JSON and OTLP/protobuf receiver on the same port; Claude Code POSTs spans straight to it.

One-time setup (Windows PowerShell shown; bash works too):

# 1. install the SessionStart hook so the dashboard spins up automatically
#    when you open Claude Code, then self-exits 15 minutes after you close it
clustertrace claude-code-hook install

# 2. print the env vars that tell Claude Code where to POST
clustertrace claude-code                  # metadata only (token counts, tool names)
clustertrace claude-code --content        # also capture prompt text + tool I/O

# 3. paste those env vars into $PROFILE (PowerShell) or ~/.bashrc / ~/.zshrc
#    so every Claude Code session inherits them.

# 4. restart Claude Code. Traces start landing.

After this, http://localhost:7777 is your dashboard whenever Claude Code is running (or has been recently). No 24/7 daemon, no Task Scheduler entry — the receiver lives only while you're using it.

To check the wiring: clustertrace claude-code-hook status reports whether the hook is installed and whether the dashboard is currently up.

Every claude_code.interaction becomes a trace; child claude_code.llm_request and claude_code.tool spans land as nested function calls with model, input/output tokens, cache hits, stop reason, tool name, and duration. Data stays on your machine (local SQLite). The receiver enforces a 16 MiB body cap (CLUSTERTRACE_OTLP_MAX_BYTES to override).

Prompt help

Open /prompts for three things on one page:

Patterns from your data — clustertrace extracts the prompts you actually wrote, splits them into "succeeded" vs "dead-ended" sessions, and reports which heuristics fire differently between the two. The biggest-delta rows are the habits to keep / drop.
Critique a draft — paste a prompt and get a lint pass: vague verbs, missing file paths, no acceptance criteria, 3+ connectives, hedging, etc. Ctrl+Enter to run.
Templates — CRUD library for the prompts you keep reusing. Copy, edit, tag, delete. The "critique" action pipes any saved template through the linter.

Each tab has a "Deepen with Claude" button that calls Anthropic's API (your ANTHROPIC_API_KEY, Haiku by default) for a sharper read when the local heuristics aren't enough. Without the key set, the button surfaces a clear hint instead of a 500.

Weekly review

Open /review for a fifteen-minute Sunday loop over your last seven days of usage. Five questions answered with SQL over your own data: top expensive sessions, cache hit rate by pattern, the pattern you ran most often, sessions that dead-ended on max_tokens / refusal / errors, and a free-text "one change for next week" you persist and mark kept / partial / missed the following Sunday.

The point is not to optimise — it's to pick one specific change per week and check whether you actually made it. Run for four weeks and you have a number on whether your prompting got more efficient.

MCP server

clustertrace mcp exposes traces, clusters, and search through the Model Context Protocol, so any MCP-capable editor (Claude Code, Cursor, Continue) can ask "show me a failing trace of this pattern" or "diff this trace against a successful one" as a single command.

pip install "clustertrace[mcp]"
clustertrace mcp install --target claude-code   # or cursor, or continue
# restart your editor; the tools appear

Six read-only tools are exposed:

tool	what
`list_clusters`	distinct execution patterns with count + failure rate
`get_trace`	full record (trace + spans + tags) for one trace id
`search`	FTS5 search over span name + I/O + error messages
`failure_summary`	aggregate failure-pattern view, optionally grouped by tag
`recent_failed`	the N most recent traces with status=error
`compare_traces`	structured diff (insert/delete/equal) of two traces' spans

Without --target, clustertrace mcp install prints the JSON snippet for you to paste into your editor's config:

{
  "clustertrace": {
    "command": "clustertrace",
    "args": ["mcp"]
  }
}

v0.9 ships read-only tools only. Annotate/assert mutation tools are slated for v1.0 once I see how the read-only surface gets used in practice.

Configuration

var	default	purpose
`CLUSTERTRACE_DB`	`~/.clustertrace/traces.db`	SQLite file path
`CLUSTERTRACE_MAX_PAYLOAD_BYTES`	`32768`	per-field cap on serialized span I/O
`CLUSTERTRACE_PRICING_JSON`	(none)	override or extend the model price table
`CLUSTERTRACE_OTLP_MAX_BYTES`	`16777216`	body cap on `POST /v1/traces`; 413 on overflow

Case study

Maintainer dogfood self-study: a synthetic research agent went from 40% failure to 15% failure after a four-line fix that the cluster page surfaced in about five seconds. Reproducible from examples/case_study_research_agent.py. The doc opens with what it does not prove (no real customer numbers yet); it's a dogfood report, not a testimonial.

Where clustertrace doesn't fit

Production multi-tenant observability with teams, retention policies, PII redaction, and a managed dashboard is a different problem. clustertrace is a debug tool on a single laptop with a single SQLite file. Single-user, no auth, no persistence-tiering. It's intentionally simpler.

FAQ

Why cluster traces instead of just listing them. Even at 60 traces (the bundled demo) the list view doesn't surface the pattern. Clustering collapses them into 18 distinct execution patterns and tells you that 2 patterns cover 10 of the 12 failures (83%). At production volumes that's the difference between reading 1000 traces and reading 2.

Why local-only with no auth. Trade-off: keeps the binary small and the trial frictionless. Single-user is the right default for a debug tool, not a production tracing service.

Does it work with LangChain / LlamaIndex / DSPy. Yes, via OpenTelemetry. Anything emitting OTel spans flows in. gen_ai.* and llm.* attribute conventions are mapped onto the clustertrace schema, so cost and clustering still work.

Streaming responses. The span is logged on completion. Chunk-by-chunk capture isn't implemented yet; v0.5 target.

How deep is the clustering algorithm. Cluster signatures use exact-string equality on a normalised, run-length-collapsed span sequence. Reorderings split clusters today: A→B→C and A→C→B end up as two clusters. Reorder-insensitive matching via set-of-edges or tree-edit-distance is the next algorithmic move. See ARCHITECTURE.md for the full design notes.

How much does the demo cost. Zero. The bundled 60 traces are pre-recorded. The full reproduction script (examples/generate_demo_data.py, 240 traces) costs about $2-3 in Haiku.

Overhead

@clustertrace.trace adds low-microsecond decorator overhead (≈35 µs of pure-Python wrapping on modern hardware), but the SQLite write is the real per-call cost: about 5 ms on Linux/macOS, about 30 ms on Windows NTFS. End-to-end traced-call latency in examples/benchmark.py is dominated by the disk write, not the decorator. For a debug tool on a laptop that's fine; you don't trace 100/sec.

For production:

@clustertrace.trace(sample=0.01)   # log 1% of calls
def hot_path(): ...

@clustertrace.trace(skip=True)     # zero overhead; returns the function unwrapped
def loop_body(): ...

Run python examples/benchmark.py to see the numbers on your hardware.

Known limitations

Streaming responses are logged on completion only, not chunk-by-chunk. The streaming: true attribute is recorded so you can filter on it, but intermediate chunks aren't captured. v0.5 target.

Replay with prompt diff is half-built: clustertrace replay re-runs with captured args, but modifying the prompt before re-invocation isn't yet exposed. v0.5.

Native wrappers only for Anthropic and OpenAI. Bedrock and Vertex work through wrap_anthropic (they share the .messages.create interface). Gemini works through OpenTelemetry.

Single-user, no auth. The dashboard is intended for 127.0.0.1. See SECURITY.md.

Contributing

ARCHITECTURE.md has the design choices; CONTRIBUTING.md has setup and a step-by-step recipe for adding a new SDK wrapper. Real gaps that would meaningfully help users are listed at the bottom of CONTRIBUTING.md.

License

MIT. See LICENSE.

Project details

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

harrywinter06

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.14.3

May 25, 2026

0.14.2

May 25, 2026

0.14.1

May 25, 2026

0.14.0

May 25, 2026

0.13.2

May 25, 2026

0.13.1

May 25, 2026

0.13.0

May 25, 2026

0.12.1

May 25, 2026

0.12.0

May 25, 2026

0.11.0

May 25, 2026

0.10.1

May 25, 2026

0.10.0

May 25, 2026

0.9.1

May 23, 2026

0.9.0

May 21, 2026

0.8.0

May 21, 2026

0.7.1

May 21, 2026

0.7.0

May 21, 2026

0.6.0

May 21, 2026

0.5.1

May 21, 2026

0.5.0

May 21, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clustertrace-0.14.3.tar.gz (300.2 kB view details)

Uploaded May 25, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

clustertrace-0.14.3-py3-none-any.whl (213.1 kB view details)

Uploaded May 25, 2026 Python 3

File details

Details for the file clustertrace-0.14.3.tar.gz.

File metadata

Download URL: clustertrace-0.14.3.tar.gz
Upload date: May 25, 2026
Size: 300.2 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for clustertrace-0.14.3.tar.gz
Algorithm	Hash digest
SHA256	`aaff2bf30adc35036d0e9fc88bc2ea51ca1eae4ec3b6f6a1e31681d140a1c335`
MD5	`10431031fd2fdc709e27d07967d3702a`
BLAKE2b-256	`fef1de30e7ad20b187e98cdb8f4d344deaedff69a42e0885b770f854f80df757`

See more details on using hashes here.

Provenance

The following attestation bundles were made for clustertrace-0.14.3.tar.gz:

Publisher: publish.yml on harrywinter06-code/clustertrace

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: clustertrace-0.14.3.tar.gz
- Subject digest: aaff2bf30adc35036d0e9fc88bc2ea51ca1eae4ec3b6f6a1e31681d140a1c335
- Sigstore transparency entry: 1629262186
- Sigstore integration time: May 25, 2026
Source repository:
- Permalink: harrywinter06-code/clustertrace@30ad046fc4879125e1eb1d3bddb82b4147a0c1de
- Branch / Tag: refs/tags/v0.14.3
- Owner: https://github.com/harrywinter06-code
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@30ad046fc4879125e1eb1d3bddb82b4147a0c1de
- Trigger Event: push

File details

Details for the file clustertrace-0.14.3-py3-none-any.whl.

File metadata

Download URL: clustertrace-0.14.3-py3-none-any.whl
Upload date: May 25, 2026
Size: 213.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for clustertrace-0.14.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8538d4eef9cc7d4b50f78271b546c083dcdfeb767848aaf1de4066d285ce9c75`
MD5	`7d850acc4fc03e5492908787819504c1`
BLAKE2b-256	`06bd3a8f7d164d8c8c4876cf9b7d614254e0f62e5fc0e36a388b2a196490eb63`

See more details on using hashes here.

Provenance

The following attestation bundles were made for clustertrace-0.14.3-py3-none-any.whl:

Publisher: publish.yml on harrywinter06-code/clustertrace

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: clustertrace-0.14.3-py3-none-any.whl
- Subject digest: 8538d4eef9cc7d4b50f78271b546c083dcdfeb767848aaf1de4066d285ce9c75
- Sigstore transparency entry: 1629262191
- Sigstore integration time: May 25, 2026
Source repository:
- Permalink: harrywinter06-code/clustertrace@30ad046fc4879125e1eb1d3bddb82b4147a0c1de
- Branch / Tag: refs/tags/v0.14.3
- Owner: https://github.com/harrywinter06-code
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@30ad046fc4879125e1eb1d3bddb82b4147a0c1de
- Trigger Event: push

clustertrace 0.14.3

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

clustertrace

Try it without an API key

Using it on a real agent

What the dashboard shows

CLI

Use with Claude Code

Prompt help

Weekly review

MCP server

Configuration

Case study

Where clustertrace doesn't fit

FAQ

Overhead

Known limitations

Contributing

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance