Local-first LLM agent observability — decorator + OpenTelemetry ingestion, traces grouped by execution pattern, cost per call, search, replay, shareable HTML snapshots.
Project description
clustertrace
Local-first LLM agent observability that tells you which clusters of traces are failing — not which individual ones.
Drop in a decorator, an SDK wrapper, or your existing OpenTelemetry setup. Get traces grouped by execution pattern, cost per call, full-text search, and replay of failing runs — all running off a single SQLite file on your laptop.
Two clusters explain 87% of all failures in the bundled demo. That's the kind of diagnosis the clusters page hands you in one screen instead of 47 stack traces.
30-second trial — no API key needed
pip install clustertrace
clustertrace demo
60 pre-recorded traces of three agents (research, RAG, tool-use), dashboard auto-launches, no API spend. Pre-PyPI: pip install "clustertrace @ git+https://github.com/harrywinter06-code/clustertrace".
When you're ready to use it for real
1. Native decorator
import clustertrace
@clustertrace.trace(tags={"agent": "research"})
async def plan(query): ...
with clustertrace.span("retrieval", k=5):
...
clustertrace.tool_call("web_search", args={"q": query}, result=hits)
clustertrace.tag("user_tier", "pro")
clustertrace.metric("score", 0.85) # numeric — aggregated to a time-series chart
Async-safe — concurrent asyncio.gather calls produce separate traces; nesting tracks the parent via contextvars.
2. SDK wrappers (no decorator needed)
from anthropic import Anthropic, AnthropicBedrock, AnthropicVertex
from openai import OpenAI
import clustertrace
client = clustertrace.wrap_anthropic(Anthropic()) # direct API
bedrock = clustertrace.wrap_anthropic(AnthropicBedrock()) # AWS Bedrock
vertex = clustertrace.wrap_anthropic(AnthropicVertex()) # Google Vertex
oai = clustertrace.wrap_openai(OpenAI()) # OpenAI
Explicit wrap — no global monkey-patching. Async clients (AsyncAnthropic, AsyncOpenAI) are detected automatically.
3. OpenTelemetry exporter (use your existing instrumentation)
If you already have OTel set up — LangChain, LlamaIndex, Bedrock auto-instrumentation, your own custom spans — add clustertrace as an exporter:
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from clustertrace.otel import ClustertraceSpanExporter
provider = TracerProvider()
provider.add_span_processor(BatchSpanProcessor(ClustertraceSpanExporter()))
The clusters page, cost view, and search work on OTel-sourced traces too. gen_ai.* and llm.* attribute conventions are mapped onto clustertrace's schema.
What you get
| Page | What |
|---|---|
| / | filterable trace list (status, tag, name search) with live polling and per-trace cost |
| /clusters | distinct execution patterns — count, failure rate, sample trace, longest common failure prefix, top failing nodes |
| /search | FTS5 search across span name + input + output + error_message; supports phrases, OR, NEAR |
| /metrics | per-metric aggregates + rolling-mean sparklines for everything you've passed to clustertrace.metric() |
| /failures | per-span error-rate bars, step-of-failure histogram, force-directed call graph |
| /trace/<id> | Gantt timeline + expandable I/O + tags + metrics + per-span cost |
See examples/sample-trace.html for a self-contained shareable snapshot of one failing trace — 16 KB single file with embedded data and renderer, no external assets.
CLI
clustertrace demo # one-step trial with bundled data
clustertrace dashboard # launch local server
clustertrace stats # one-screen DB summary
clustertrace backfill-cost # compute $ for every LLM call
clustertrace backfill-signatures # signatures for older traces
clustertrace snapshot <trace_id> -o trace.html # self-contained shareable HTML
clustertrace export <trace_id> # JSONL to stdout
clustertrace export --all > backup.jsonl # everything
clustertrace import < backup.jsonl # merge (skips existing IDs)
clustertrace replay <trace_id> --entry mod:fn # re-run with captured args
clustertrace db-path # print SQLite path
Configuration
| Var | Default | Purpose |
|---|---|---|
CLUSTERTRACE_DB |
~/.clustertrace/traces.db |
SQLite file path |
CLUSTERTRACE_MAX_PAYLOAD_BYTES |
32768 |
Per-field cap on serialized span I/O |
CLUSTERTRACE_PRICING_JSON |
(none) | Override or extend the model price table |
How does this compare to Langfuse / Phoenix / LangSmith?
| clustertrace | Langfuse OSS | Arize Phoenix | LangSmith | |
|---|---|---|---|---|
| Local-first (one binary / SQLite) | yes | no (Postgres + worker) | yes (in-memory or Postgres) | no (SaaS) |
| Clusters traces by execution pattern | yes — the differentiator | no | partial (groupings by ID, not signature) | no |
| Longest common failure prefix | yes | no | no | no |
| OpenTelemetry ingestion | yes (exporter) | yes | yes | partial |
| Cost tracking | yes (built-in pricing) | yes | yes | yes |
| Full-text search | yes (FTS5) | yes | yes | yes |
| Replay with captured args | yes | partial | partial | yes |
| Self-contained shareable trace HTML | yes (no other tool ships this) | no | no | no |
| Decorator + OTel + SDK wrappers | all three | OTel + wrappers | OTel | wrappers |
| Single-file install, no server setup | yes | no | yes (for in-mem) | n/a |
| Multi-user / teams | no | yes | yes | yes |
| Production retention / sampling | no | yes | yes | yes |
Pick clustertrace when: you're debugging a single agent or running a small eval suite on your laptop, you want clustering + failure-prefix mining as a first-class view, and you'd rather pip install than docker compose up.
Pick Langfuse / Phoenix / LangSmith when: you're running in production, need teams, need retention policies, need PII redaction, or want a managed dashboard. clustertrace is intentionally simpler.
FAQ
Why not just use Langfuse OSS? Langfuse is more capable for production deployment — multi-user, Postgres-backed, fully featured. It's also a four-container Docker stack that needs a workers process and a separate web service. clustertrace is one Python package and one SQLite file. If you want to debug an agent on your laptop tonight, clustertrace is faster to set up; if you want to deploy a tracing service for a team, Langfuse is the right answer.
Why "clustering" instead of just listing traces? Because at 200+ traces, eyeballing the list doesn't find the pattern. The demo data has 29 distinct execution patterns; the top 2 account for 87% of all failures. That's the kind of structural signal you can't see from a list — and it's the diagnosis that points you at the actual fix.
Why local-only / no auth? Trade-off: keeps the binary small and the trial frictionless. Single-user is the right default for a debug tool. The README is explicit that production observability with retention and teams is a different tool's job.
Does it work with LangChain / LlamaIndex / DSPy? Yes, via the OpenTelemetry path. Anything emitting OTel spans flows into clustertrace. We map gen_ai.* / llm.* attribute conventions onto our schema so cost and clustering still work.
Does it support streaming? The span is logged on completion. Chunk-by-chunk capture isn't implemented yet (v0.4 target).
What's the algorithmic depth? Cluster signatures use exact-string equality on a normalized, run-length-collapsed span sequence. Reorderings split clusters today (A→B→C and A→C→B are two clusters). Reorder-insensitive matching via set-of-edges or tree-edit-distance is the v0.4 algorithmic move. The README doesn't oversell the implementation — see ARCHITECTURE.md for the full design trade-offs.
How much does the demo cost? $0. The bundled 60 traces are pre-recorded. The full reproduction script (examples/generate_demo_data.py, 240 traces) costs ~$2-3 in Haiku.
Overhead
@clustertrace.trace adds ~35 µs of pure-Python overhead per call on modern hardware; the SQLite write that follows is the real cost (~5 ms on Linux/macOS, ~30 ms on Windows NTFS). For a debug tool on a laptop this is fine — you don't trace 100/sec. For production:
@clustertrace.trace(sample=0.01) # log 1% of calls
def hot_path(): ...
@clustertrace.trace(skip=True) # zero overhead — returns the function unwrapped
def loop_body(): ...
Run python examples/benchmark.py to see the numbers on your hardware.
Known limitations
- Streaming responses are logged on completion only, not chunk-by-chunk. The
streaming: trueattribute is recorded so you can filter — but the intermediate chunks aren't captured. v0.5 target. - Replay with prompt diff is half-built —
clustertrace replayre-runs with captured args; modifying the prompt before re-invocation is not yet exposed. v0.5. - Native wrappers only for Anthropic and OpenAI. Bedrock + Vertex work through
wrap_anthropic(shared.messages.createinterface). Gemini works through OpenTelemetry. - Single-user, no auth. Dashboard is intended for
127.0.0.1. See SECURITY.md.
Contributing
Read ARCHITECTURE.md for the design choices, CONTRIBUTING.md for the setup and the step-by-step recipe for adding a new SDK wrapper. Real gaps that would meaningfully help users are listed at the bottom of CONTRIBUTING.md.
License
MIT.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file clustertrace-0.8.0.tar.gz.
File metadata
- Download URL: clustertrace-0.8.0.tar.gz
- Upload date:
- Size: 248.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ec454753b23dcfd99d4faede86f8f3142c5b823a4ea8fd570019ccead2a26748
|
|
| MD5 |
2e46520d9ecd29fb1185afab40aa7c62
|
|
| BLAKE2b-256 |
abd5ff3f100ace5860d116b8902fb08c0ae5f37248aac2180c377d0d0c3ffbbe
|
Provenance
The following attestation bundles were made for clustertrace-0.8.0.tar.gz:
Publisher:
publish.yml on harrywinter06-code/clustertrace
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
clustertrace-0.8.0.tar.gz -
Subject digest:
ec454753b23dcfd99d4faede86f8f3142c5b823a4ea8fd570019ccead2a26748 - Sigstore transparency entry: 1596371446
- Sigstore integration time:
-
Permalink:
harrywinter06-code/clustertrace@44cc6ddbfe52ec3440f0399d996f3265245bfc45 -
Branch / Tag:
refs/tags/v0.8.0 - Owner: https://github.com/harrywinter06-code
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@44cc6ddbfe52ec3440f0399d996f3265245bfc45 -
Trigger Event:
push
-
Statement type:
File details
Details for the file clustertrace-0.8.0-py3-none-any.whl.
File metadata
- Download URL: clustertrace-0.8.0-py3-none-any.whl
- Upload date:
- Size: 164.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
14e234d9420cd2b13a138abc3db8c2dc02c1d31335aeb76e04d949d4c7432316
|
|
| MD5 |
e2869a2dd33fde31790216516e9d8671
|
|
| BLAKE2b-256 |
97b2f31b099f3f79921687fe8bd50d937d40d7bfacbe358f38d5df3148acb677
|
Provenance
The following attestation bundles were made for clustertrace-0.8.0-py3-none-any.whl:
Publisher:
publish.yml on harrywinter06-code/clustertrace
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
clustertrace-0.8.0-py3-none-any.whl -
Subject digest:
14e234d9420cd2b13a138abc3db8c2dc02c1d31335aeb76e04d949d4c7432316 - Sigstore transparency entry: 1596371566
- Sigstore integration time:
-
Permalink:
harrywinter06-code/clustertrace@44cc6ddbfe52ec3440f0399d996f3265245bfc45 -
Branch / Tag:
refs/tags/v0.8.0 - Owner: https://github.com/harrywinter06-code
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@44cc6ddbfe52ec3440f0399d996f3265245bfc45 -
Trigger Event:
push
-
Statement type: