Observability-platform-agnostic triage runtime for LLM agent traces

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

wczaja

These details have not been verified by PyPI

Project description

docket

An observability-platform-agnostic triage runtime for LLM agent traces.

docket reads traces from your existing observability backend (Phoenix, Langfuse, LangSmith), classifies each one against a YAML failure-mode taxonomy you write, clusters similar failures together, and drafts issues into your tracker (Jira, Linear, GitHub Issues). It is not a new observability backend, an eval framework, or a web UI — it's a thin agent that sits above what you already have.

Human-in-the-loop is the default: drafts queue locally or open in your $EDITOR for review before they post. Auto-posting requires an explicit opt-in (auto_post_threshold).

Quickstart (5 minutes)

The fastest path to a working setup: a local Phoenix backend + a GitHub-Issues tracker.

1. Install

pip install docket-runtime
# or:  uv pip install docket-runtime

2. Bring up Phoenix

docker run -p 6006:6006 -p 4317:4317 arizephoenix/phoenix:latest

Send your agent's traces to http://localhost:6006 via the OpenInference instrumentation of your choice (any OTLP-compatible instrumentation works — see docs/local-phoenix.md for ingestion recipes).

3. Configure credentials

export ANTHROPIC_API_KEY="sk-ant-..."         # for the llm_judge detectors
export OPENAI_API_KEY="sk-..."                # for clustering embeddings
                                              # (required even with an
                                              # Anthropic classifier)
export GITHUB_TOKEN="ghp_..."                 # PAT with Issues write

4. Run

docket run \
  --backend phoenix \
  --phoenix-url http://localhost:6006 \
  --tracker github \
  --github-owner YOUR_GH_USER \
  --github-repo docket-issues \
  --rubric docket.dev/builtin/agents/v1 \
  --since 1h

That's it. The pipeline:

Pulls the last hour of traces from Phoenix.
Runs each one through the agents/v1 failure-mode rubric.
Clusters positive classifications per mode.
Drafts one issue per cluster into ~/.docket/queued-issues/<run-id>/.
Looks at your GitHub repo for matching open issues (dedup by labels + embedded provenance) and comments on existing issues that grew, or leaves new ones in the local queue for --review.
Prints a markdown report.

Add --review to walk each queued draft through $EDITOR + accept/reject

post. Add --auto-post-threshold high to auto-post critical and high severity drafts. Add --dry-run to price a window before committing to it.

For scheduled triage, swap run for the daemon:

docket serve --interval 1h ...   # same flags as run

Each tick processes exactly the window since the last successful tick — no gaps, no overlap — and a failed tick retries its window instead of dropping it. (Plain cron + docket run works too; serve just does the window bookkeeping for you.)

For other backends and trackers, see docs/quickstart.md (full matrix: Phoenix/Langfuse/LangSmith × Jira/Linear/GitHub).

What it does

docket runs a small pipeline of LLM-driven subagents over your existing traces:

┌──────────────────────────┐
│ Phoenix / Langfuse /     │
│ LangSmith trace backend  │  <- you already have this
└────────────┬─────────────┘
             │ trace fetch (read-only by default)
             ▼
   ┌─────────────────────┐
   │ classifier subagent │  rubric: YAML failure-mode taxonomy
   └──────────┬──────────┘     (built-in or your own)
              ▼
   ┌─────────────────────┐
   │ clusterer subagent  │  embeddings + HDBSCAN per mode
   └──────────┬──────────┘
              ▼
   ┌─────────────────────┐
   │ drafter subagent    │  one IssueDraft per cluster, with
   └──────────┬──────────┘     embedded provenance for dedup
              ▼
   ┌─────────────────────┐
   │ poster subagent     │  dedup against tracker, then
   └──────────┬──────────┘     comment / create / queue
              ▼
┌──────────────────────────┐
│ Jira / Linear / GitHub   │
└──────────────────────────┘

Read-only by default. Annotations write back to the trace backend only when you pass --annotate. Issues post to the tracker only when their severity meets auto_post_threshold (default: never) or when you opt in via --review.

Bounded by default. Every run is capped by max_traces_per_run (default 1000, measured after sampling and checkpoint subtraction); exceeding the cap aborts loudly before any trace is fetched — never a silent truncation. An optional max_estimated_cost_usd adds a dollar ceiling on the pre-flight cost estimate. --dry-run reports both gates and exits non-zero iff the real run would abort, so CI can use it as a preflight check. For production-scale windows, --sample N bounds the work with --strategy uniform, --strategy errors-only (root-errored traces, filter pushed down to the backend), or --strategy stratified --stratify-by status|latency_bucket|tag:<key> (equal allocation so rare strata — errors, small tenants, tail latencies — get seen). Adapters flag truncated listings — trace and tracker alike — instead of silently stopping at their pagination ceiling; when the open-issue listing is truncated during dedup, drafts are queued for review instead of auto-posted, since "no duplicate found" was not proven.

State lives in the backends, not here. docket doesn't own a database. Annotations key off (trace_id, run_id, rubric_version, mode_id) in the observability backend; issues key off labels + HTML-comment provenance in the tracker. Re-running the same window is idempotent.

Built-in rubrics

Four reference rubrics ship with the package; each is a starting point intended to be imported into a domain-specific rubric you maintain.

URI	Modes
`docket.dev/builtin/agents/v1`	6 — hallucination, infinite loop, premature termination, unsafe tool call, refusal leakage, bad handoff
`docket.dev/builtin/rag/v1`	4 — off-corpus answer, missing citation, stale retrieval, context overflow
`docket.dev/builtin/routing/v1`	4 — wrong-skill routing, capability mismatch, dead-end transfer, oscillation
`docket.dev/builtin/multi-agent/v1`	4 — handoff context loss, conflicting instructions, role drift, shared-memory corruption

Reference them by URI on the CLI (--rubric docket.dev/builtin/rag/v1) or import them into your own rubric:

apiVersion: docket.dev/v1
kind: Rubric
metadata:
  name: my-prod-agents
  version: 1.0.0
imports:
  - docket.dev/builtin/agents/v1
  - docket.dev/builtin/rag/v1
modes:
  - id: refund-without-confirmation
    severity: critical
    detection:
      type: tool_call
      tool_calls: [process_refund]
    # ... your modes go here

Validate with docket validate ./my-rubric.yaml. Smoke-test the examples with docket self-test ./my-rubric.yaml.

Architecture overview

OpenInference is the canonical trace schema. Adapters normalize to it; the runtime never sees backend-specific shapes.
MCP is the integration protocol for both trace backends and trackers. The CLI ships one MCP server binary per adapter (docket-adapter-phoenix, docket-adapter-jira, …) that you can run standalone or invoke through docket run.
deepagents is the agent harness; we don't reimplement planning, virtual filesystems, or subagent delegation.
Stateless runtime. Annotations live in the backend; issues live in the tracker. No local database.
Pydantic v2 + httpx + asyncio throughout. No bespoke SDK dependency per backend — every adapter is plain HTTP.

Execution modes

docket ships two execution modes over the same six pipeline stages (list_traces → classify_traces → annotate_classifications → cluster_classifications → draft_issues → write_report):

Deterministic pipeline (default). Stages run in a fixed order from plain Python. Predictable cost, reproducible across runs, easy to debug. Use this for batch / cron / CI, anywhere SLOs and cost forecasting matter.
deepagents harness (--agent). Same six stages exposed as tools to a top-level planning LLM. Use this for exploratory / debugging runs today; the harness is the substrate the project commits to for future interactive surfaces (chat-driven triage, incident investigation, rubric authoring). The tools and entry points for those surfaces are post-v1.0 work — see docs/design.md §4.2 and §7 (Phases 14–15).

Both modes share the same subagents, the same run_id, and the same annotation idempotency, so investments in one benefit the other.

For the full design, see docs/design.md. Per-backend and per-tracker setup guides:

Phoenix
Langfuse
LangSmith
Jira — Cloud + Data Center
Linear
GitHub Issues

Documentation

Start at the docs index.

Guides

Quickstart — every backend × tracker pair
Concepts — the vocabulary in five minutes
Adapters — the integration contracts + how to add a backend or tracker
Benchmarks — wall time and cost for a 1000-trace run
Design document — every architectural decision, with rationale

API reference

CLI — every command, flag, and exit code for run, serve, validate, self-test, and the adapter binaries
Configuration — docket.yaml schema, all env vars, precedence rules, defaults
Python API — embed the pipeline as a library: run_triage_pipeline, adapters, providers, models, errors
MCP servers — tool contracts for driving the adapters from any MCP client
Rubric DSL — the complete taxonomy spec, with a worked example rubric

Status

v1.0. Three trace-backend adapters and three tracker adapters at parity, four built-in rubrics, deterministic + agent-harness execution modes, daemon mode, budget guardrails and sampling. The changelog has the full feature list. Post-1.0 roadmap (streaming, sharding, interactive surfaces) lives in docs/design.md §7.

Contributing

Rubrics and adapters are the highest-leverage contributions, and both have step-by-step guides: see CONTRIBUTING.md. Bug reports and adapter proposals have issue templates. Security issues go through SECURITY.md — never a public issue.

License

Apache 2.0. See LICENSE.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

wczaja

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

1.0.0

Jun 18, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docket_runtime-1.0.0.tar.gz (122.5 kB view details)

Uploaded Jun 18, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

docket_runtime-1.0.0-py3-none-any.whl (168.2 kB view details)

Uploaded Jun 18, 2026 Python 3

File details

Details for the file docket_runtime-1.0.0.tar.gz.

File metadata

Download URL: docket_runtime-1.0.0.tar.gz
Upload date: Jun 18, 2026
Size: 122.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for docket_runtime-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`4554127971180179c7a719fa8d87fc4a941dc88a5b9b8a319e321d33488e6e64`
MD5	`448aac80c918011a37cdb22a3ee5cdb7`
BLAKE2b-256	`b8e59ec84ade12f50cf59b3b4e6beaa944a52699df4ce59da001b2cce3f93030`

See more details on using hashes here.

Provenance

The following attestation bundles were made for docket_runtime-1.0.0.tar.gz:

Publisher: release.yml on wczaja/docket

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: docket_runtime-1.0.0.tar.gz
- Subject digest: 4554127971180179c7a719fa8d87fc4a941dc88a5b9b8a319e321d33488e6e64
- Sigstore transparency entry: 1855750059
- Sigstore integration time: Jun 18, 2026
Source repository:
- Permalink: wczaja/docket@b275a941cafae84c0051d77b86db9dedb5fcb4b9
- Branch / Tag: refs/tags/v1.0.0
- Owner: https://github.com/wczaja
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@b275a941cafae84c0051d77b86db9dedb5fcb4b9
- Trigger Event: push

File details

Details for the file docket_runtime-1.0.0-py3-none-any.whl.

File metadata

Download URL: docket_runtime-1.0.0-py3-none-any.whl
Upload date: Jun 18, 2026
Size: 168.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for docket_runtime-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4917485799bca86c8f06b2ff9bacca2f65d247af8fbd8605114feb994fde6bc7`
MD5	`301dacdfe98387ba6f1ec1e5df294f97`
BLAKE2b-256	`6d3a1930ad9e0a50a306d7dfe1657b11465dfa160c1f7da91a189cce2d0d430b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for docket_runtime-1.0.0-py3-none-any.whl:

Publisher: release.yml on wczaja/docket

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: docket_runtime-1.0.0-py3-none-any.whl
- Subject digest: 4917485799bca86c8f06b2ff9bacca2f65d247af8fbd8605114feb994fde6bc7
- Sigstore transparency entry: 1855750086
- Sigstore integration time: Jun 18, 2026
Source repository:
- Permalink: wczaja/docket@b275a941cafae84c0051d77b86db9dedb5fcb4b9
- Branch / Tag: refs/tags/v1.0.0
- Owner: https://github.com/wczaja
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@b275a941cafae84c0051d77b86db9dedb5fcb4b9
- Trigger Event: push

docket-runtime 1.0.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

docket

Quickstart (5 minutes)

1. Install

2. Bring up Phoenix

3. Configure credentials

4. Run

What it does

Built-in rubrics

Architecture overview

Execution modes

Documentation

Status

Contributing

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance