Skip to main content

Local-first LLM agent observability — decorator + OpenTelemetry ingestion, traces grouped by execution pattern, cost per call, search, replay, shareable HTML snapshots.

Project description

clustertrace clusters page

clustertrace

Local-first LLM agent observability that tells you which clusters of traces are failing — not which individual ones.

tests pypi python license

Drop in a decorator, an SDK wrapper, or your existing OpenTelemetry setup. Get traces grouped by execution pattern, cost per call, full-text search, and replay of failing runs — all running off a single SQLite file on your laptop.

Two clusters explain 87% of all failures in the bundled demo. That's the kind of diagnosis the clusters page hands you in one screen instead of 47 stack traces.


30-second trial — no API key needed

pip install clustertrace
clustertrace demo

60 pre-recorded traces of three agents (research, RAG, tool-use), dashboard auto-launches, no API spend. Pre-PyPI: pip install "clustertrace @ git+https://github.com/harrywinter06-code/clustertrace".


When you're ready to use it for real

1. Native decorator

import clustertrace

@clustertrace.trace(tags={"agent": "research"})
async def plan(query): ...

with clustertrace.span("retrieval", k=5):
    ...

clustertrace.tool_call("web_search", args={"q": query}, result=hits)
clustertrace.tag("user_tier", "pro")
clustertrace.metric("score", 0.85)        # numeric — aggregated to a time-series chart

Async-safe — concurrent asyncio.gather calls produce separate traces; nesting tracks the parent via contextvars.

2. SDK wrappers (no decorator needed)

from anthropic import Anthropic, AnthropicBedrock, AnthropicVertex
from openai import OpenAI
import clustertrace

client  = clustertrace.wrap_anthropic(Anthropic())          # direct API
bedrock = clustertrace.wrap_anthropic(AnthropicBedrock())   # AWS Bedrock
vertex  = clustertrace.wrap_anthropic(AnthropicVertex())    # Google Vertex
oai     = clustertrace.wrap_openai(OpenAI())                # OpenAI

Explicit wrap — no global monkey-patching. Async clients (AsyncAnthropic, AsyncOpenAI) are detected automatically.

3. OpenTelemetry exporter (use your existing instrumentation)

If you already have OTel set up — LangChain, LlamaIndex, Bedrock auto-instrumentation, your own custom spans — add clustertrace as an exporter:

from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from clustertrace.otel import ClustertraceSpanExporter

provider = TracerProvider()
provider.add_span_processor(BatchSpanProcessor(ClustertraceSpanExporter()))

The clusters page, cost view, and search work on OTel-sourced traces too. gen_ai.* and llm.* attribute conventions are mapped onto clustertrace's schema.


What you get

Page What
/ filterable trace list (status, tag, name search) with live polling and per-trace cost
/clusters distinct execution patterns — count, failure rate, sample trace, longest common failure prefix, top failing nodes
/search FTS5 search across span name + input + output + error_message; supports phrases, OR, NEAR
/metrics per-metric aggregates + rolling-mean sparklines for everything you've passed to clustertrace.metric()
/failures per-span error-rate bars, step-of-failure histogram, force-directed call graph
/trace/<id> Gantt timeline + expandable I/O + tags + metrics + per-span cost

See examples/sample-trace.html for a self-contained shareable snapshot of one failing trace — 16 KB single file with embedded data and renderer, no external assets.

CLI

clustertrace demo                                      # one-step trial with bundled data
clustertrace dashboard                                 # launch local server
clustertrace stats                                     # one-screen DB summary
clustertrace backfill-cost                             # compute $ for every LLM call
clustertrace backfill-signatures                       # signatures for older traces
clustertrace snapshot <trace_id> -o trace.html         # self-contained shareable HTML
clustertrace export <trace_id>                         # JSONL to stdout
clustertrace export --all > backup.jsonl               # everything
clustertrace import < backup.jsonl                     # merge (skips existing IDs)
clustertrace replay <trace_id> --entry mod:fn          # re-run with captured args
clustertrace db-path                                   # print SQLite path

Configuration

Var Default Purpose
CLUSTERTRACE_DB ~/.clustertrace/traces.db SQLite file path
CLUSTERTRACE_MAX_PAYLOAD_BYTES 32768 Per-field cap on serialized span I/O
CLUSTERTRACE_PRICING_JSON (none) Override or extend the model price table

How does this compare to Langfuse / Phoenix / LangSmith?

clustertrace Langfuse OSS Arize Phoenix LangSmith
Local-first (one binary / SQLite) yes no (Postgres + worker) yes (in-memory or Postgres) no (SaaS)
Clusters traces by execution pattern yes — the differentiator no partial (groupings by ID, not signature) no
Longest common failure prefix yes no no no
OpenTelemetry ingestion yes (exporter) yes yes partial
Cost tracking yes (built-in pricing) yes yes yes
Full-text search yes (FTS5) yes yes yes
Replay with captured args yes partial partial yes
Self-contained shareable trace HTML yes (no other tool ships this) no no no
Decorator + OTel + SDK wrappers all three OTel + wrappers OTel wrappers
Single-file install, no server setup yes no yes (for in-mem) n/a
Multi-user / teams no yes yes yes
Production retention / sampling no yes yes yes

Pick clustertrace when: you're debugging a single agent or running a small eval suite on your laptop, you want clustering + failure-prefix mining as a first-class view, and you'd rather pip install than docker compose up.

Pick Langfuse / Phoenix / LangSmith when: you're running in production, need teams, need retention policies, need PII redaction, or want a managed dashboard. clustertrace is intentionally simpler.


FAQ

Why not just use Langfuse OSS? Langfuse is more capable for production deployment — multi-user, Postgres-backed, fully featured. It's also a four-container Docker stack that needs a workers process and a separate web service. clustertrace is one Python package and one SQLite file. If you want to debug an agent on your laptop tonight, clustertrace is faster to set up; if you want to deploy a tracing service for a team, Langfuse is the right answer.

Why "clustering" instead of just listing traces? Because at 200+ traces, eyeballing the list doesn't find the pattern. The demo data has 29 distinct execution patterns; the top 2 account for 87% of all failures. That's the kind of structural signal you can't see from a list — and it's the diagnosis that points you at the actual fix.

Why local-only / no auth? Trade-off: keeps the binary small and the trial frictionless. Single-user is the right default for a debug tool. The README is explicit that production observability with retention and teams is a different tool's job.

Does it work with LangChain / LlamaIndex / DSPy? Yes, via the OpenTelemetry path. Anything emitting OTel spans flows into clustertrace. We map gen_ai.* / llm.* attribute conventions onto our schema so cost and clustering still work.

Does it support streaming? The span is logged on completion. Chunk-by-chunk capture isn't implemented yet (v0.4 target).

What's the algorithmic depth? Cluster signatures use exact-string equality on a normalized, run-length-collapsed span sequence. Reorderings split clusters today (A→B→C and A→C→B are two clusters). Reorder-insensitive matching via set-of-edges or tree-edit-distance is the v0.4 algorithmic move. The README doesn't oversell the implementation — see ARCHITECTURE.md for the full design trade-offs.

How much does the demo cost? $0. The bundled 60 traces are pre-recorded. The full reproduction script (examples/generate_demo_data.py, 240 traces) costs ~$2-3 in Haiku.


Overhead

@clustertrace.trace adds ~35 µs of pure-Python overhead per call on modern hardware; the SQLite write that follows is the real cost (~5 ms on Linux/macOS, ~30 ms on Windows NTFS). For a debug tool on a laptop this is fine — you don't trace 100/sec. For production:

@clustertrace.trace(sample=0.01)   # log 1% of calls
def hot_path(): ...

@clustertrace.trace(skip=True)     # zero overhead — returns the function unwrapped
def loop_body(): ...

Run python examples/benchmark.py to see the numbers on your hardware.

Known limitations

  • Streaming responses are logged on completion only, not chunk-by-chunk. The streaming: true attribute is recorded so you can filter — but the intermediate chunks aren't captured. v0.5 target.
  • Replay with prompt diff is half-builtclustertrace replay re-runs with captured args; modifying the prompt before re-invocation is not yet exposed. v0.5.
  • Native wrappers only for Anthropic and OpenAI. Bedrock + Vertex work through wrap_anthropic (shared .messages.create interface). Gemini works through OpenTelemetry.
  • Single-user, no auth. Dashboard is intended for 127.0.0.1. See SECURITY.md.

Contributing

Read ARCHITECTURE.md for the design choices, CONTRIBUTING.md for the setup and the step-by-step recipe for adding a new SDK wrapper. Real gaps that would meaningfully help users are listed at the bottom of CONTRIBUTING.md.

License

MIT.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clustertrace-0.7.0.tar.gz (183.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

clustertrace-0.7.0-py3-none-any.whl (132.4 kB view details)

Uploaded Python 3

File details

Details for the file clustertrace-0.7.0.tar.gz.

File metadata

  • Download URL: clustertrace-0.7.0.tar.gz
  • Upload date:
  • Size: 183.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for clustertrace-0.7.0.tar.gz
Algorithm Hash digest
SHA256 491f46a5bb5cce81c86ecf9cd1a2abaac866f6d1a38077442ad97e505c3e2ce6
MD5 16fd84f8b3e4f11eff87138aacddd0c0
BLAKE2b-256 9d265fb557f9cff777e2beeb428500179ea728674cd401ec9bfa84c67994a658

See more details on using hashes here.

Provenance

The following attestation bundles were made for clustertrace-0.7.0.tar.gz:

Publisher: publish.yml on harrywinter06-code/clustertrace

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file clustertrace-0.7.0-py3-none-any.whl.

File metadata

  • Download URL: clustertrace-0.7.0-py3-none-any.whl
  • Upload date:
  • Size: 132.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for clustertrace-0.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fbbf3dc2e2e3c5ac80b29c9a4f3d70db54a4074216ac6c5331883e5a43c57370
MD5 4db413b7658e1879b470e3e724b32795
BLAKE2b-256 e4a356fec28603b47c3a1f77ff6f65e7c6eb99b1f9d79c8d8e00f88ea7afaff8

See more details on using hashes here.

Provenance

The following attestation bundles were made for clustertrace-0.7.0-py3-none-any.whl:

Publisher: publish.yml on harrywinter06-code/clustertrace

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page