Skip to main content

Observability and reliability platform for agentic AI systems

Project description

Anjor

CI Coverage PyPI Python License: MIT

AI agents fail silently. A tool times out, a schema drifts, the context window fills up — and you find out from a user complaint, not a dashboard.

Anjor fixes that. It gives you full visibility into every LLM call, tool use, and MCP server interaction — latency, token usage, context window growth, schema drift, prompt changes — without changing a single line of your agent code. Beyond passive logging, it surfaces actionable analysis: failure pattern clustering, token optimization suggestions, and per-tool quality scores (A–F).

One-line install. No cloud. No account required.


Dashboard

Overview

Platform-wide summary: tool health, LLM cost by model, MCP server status, top failure patterns, and drift alerts at a glance.

More screenshots

LLM Usage — token consumption and cost by model with daily trend and cache savings

Usage

MCP Servers — per-server call volume, success rates, and tool breakdown

MCP

Intelligence — failure clusters, token optimization opportunities, and quality scores

Intelligence

Tools — latency percentiles, drift detection, and per-tool drill-down

Tools

Traces — multi-agent span trees and cross-agent attribution

Traces


Install

Recommended (macOS / Homebrew Python):

brew install pipx && pipx ensurepath
pipx install anjor           # collector + dashboard + CLI
pipx install "anjor[mcp]"    # add MCP server support (Claude Code / Gemini CLI)

Then open a new terminal tab so $PATH picks up the new command.

Or with pip (inside a virtualenv):

pip install anjor
pip install "anjor[mcp]"

Note: anjor watch-transcripts and anjor mcp require v0.8.0+. anjor status requires v0.9.0+. anjor report, anjor diff, and OTel export require v1.0.0+.


Quickstart

Tracking Claude Code (or Gemini CLI)

Best for users of Claude Code or Gemini CLI who want a visual dashboard of their sessions. No changes to your workflow needed.

Option A: Via MCP (recommended — auto-starts with Claude Code)

Add to .mcp.json in your project root (or ~/.claude/.mcp.json for global):

{
  "mcpServers": {
    "anjor": {
      "command": "anjor",
      "args": ["mcp", "--watch-transcripts"]
    }
  }
}

Anjor auto-starts the collector, ingests your Claude Code session transcripts, and exposes anjor_status as a tool Claude Code can call mid-session. It returns a time-windowed summary with actionable insights — failure rates, context utilisation, estimated cost — silently suppressed when everything is healthy.

For Gemini CLI, add to .gemini/settings.json (or ~/.gemini/settings.json):

{
  "mcpServers": {
    "anjor": {
      "command": "anjor",
      "args": ["mcp", "--watch-transcripts", "--providers", "gemini"]
    }
  }
}

Option B: Standalone (no MCP required)

anjor start --watch-transcripts

One command starts the collector, dashboard, and transcript watcher together. Open http://localhost:7843/ui/ to see your sessions.

To watch specific agents or adjust the polling interval:

anjor start --watch-transcripts --providers claude,gemini --poll-interval 5.0

Run anjor watch-transcripts --list-providers to see which agents are detected on your machine.

Terminal health check (no browser needed)

anjor status                          # last 2h summary
anjor status --since-minutes 30       # last 30 minutes
anjor status --project myapp          # filter to a specific project

Prints a compact one-line summary with any actionable warnings below it:

last 2h: 47 calls · 6% failure · $0.08 · 74% ctx
⚠  web_search has a 30% failure rate (3/10 calls)
⚠  Context at 74%

Silent when everything is healthy. Exits with code 2 if the collector is not running.


CI/CD Quality Gates

Use anjor report and anjor diff in CI pipelines — both read SQLite directly, no running collector needed.

# After your test run, gate on quality metrics
anjor report --assert "success_rate >= 0.95" --assert "p95_latency_ms <= 3000"
# exit 0 = all assertions pass; exit 1 = assertion failed; exit 2 = no data

# Compare last 24h vs prior 24h — catch regressions before they ship
anjor diff --window 24h

# Full report as Markdown for CI artifacts
anjor report --format markdown --since 2h > report.md

Supported assertion metrics: success_rate, p95_latency_ms, failure_count, total_cost_usd

Diff windows: 30m, 2h, 24h, 7d — any combination of minutes, hours, days.


Alerting and Budgeting

Configure threshold alerts in .anjor.toml — Anjor fires a webhook whenever a condition is breached. Silent by default; you only hear from it when something matters.

[[alerts]]
name = "high_failure_rate"
condition = "failure_rate > 0.20"
window_calls = 10                            # rolling window of last N tool calls
webhook = "https://hooks.slack.com/services/..."

[[alerts]]
name = "context_warning"
condition = "context_utilisation > 0.80"
webhook = "https://hooks.slack.com/services/..."

[[alerts]]
name = "daily_budget"
condition = "daily_cost_usd > 5.00"
webhook = "https://example.com/webhook"

Supported conditions:

Condition Triggers on
failure_rate > N Rolling window of tool calls exceeds N (0–1)
p95_latency > N p95 latency in rolling window exceeds N ms
context_utilisation > N Any LLM call where context used exceeds N (0–1)
daily_cost_usd > N Cumulative estimated cost today exceeds $N
session_cost_usd > N Cumulative cost since collector start exceeds $N
error_type == "timeout" Tool call fails with the specified error type

Webhook payload:

{"alert": "daily_budget", "value": 5.21, "threshold": 5.00, "timestamp": "2026-04-17T..."}

When the URL contains hooks.slack.com, the payload is auto-formatted as {"text": "anjor alert: ..."} for Slack compatibility.


Tracking Your Own Agent (API Patching)

Best for developers building custom AI agents who want real-time telemetry.

1. Start the collector and dashboard:

anjor start

2. Patch your agent (one line):

import anjor
anjor.patch()  # instrument httpx automatically

import anthropic
client = anthropic.Anthropic()
# All calls are now captured — no other changes needed.

What it captures

Signal Details
Tool calls Name, status (success/failure), failure type, latency
MCP servers Per-server call volume, success rate, latency — parsed from mcp__server__tool naming
Schema fingerprints SHA-256 structural hash of tool input/output shape
Schema drift Field-level diff against the baseline for each tool
LLM calls Model, latency, finish reason — Anthropic, OpenAI, and Gemini
Token usage Input + output + cache_read + cache_write tokens per call
Context window Tokens used vs model limit, utilisation %, per-trace growth
Cache savings Prompt cache hit rate and estimated cost savings
Context hogs Per-tool average output size, % of context consumed
System prompt drift SHA-256 per agent — alerts when prompt changes between calls
Failure patterns Clustered failure analysis with descriptions and fix suggestions
Token optimization Tools consuming >5% of context window, cost savings estimates
Quality scores Per-tool reliability/schema-stability/latency grade (A–F)
Run quality Per-trace context efficiency, failure recovery, diversity grade
Multi-agent spans Parent/child span linking across agent boundaries
Trace graphs DAG reconstruction, topological order, cycle detection
Cross-agent attribution Token usage and failure rate broken down per agent

Supported providers

Provider SDK Intercepted endpoint
Anthropic anthropic api.anthropic.com/v1/messages
OpenAI openai api.openai.com/v1/chat/completions
Google Gemini google-generativeai generativelanguage.googleapis.com/.../generateContent

All three providers are auto-detected — no configuration required.


AI Coding Agents (Transcript Watchers)

Anjor can ingest and visualize history from agents that write local session transcripts. It acts as a post-hoc observatory even for agents you didn't build.

Agent Source tag Discovery path MCP support Message capture
Claude Code claude_code ~/.claude/projects/**/*.jsonl Yes — .mcp.json Yes
Gemini CLI gemini_cli ~/.gemini/tmp/**/*.json Yes — .gemini/settings.json Yes
OpenAI Codex openai_codex ~/.codex/sessions/**/*.jsonl Coming soon Yes

Message capture is on by default — Session Replay works out of the box. To disable:

# .anjor.toml
capture_messages = false

Or pass --no-capture-messages to anjor start / anjor mcp / anjor watch-transcripts.

Note: AntiGravity was removed from the watcher list — it is an IDE (VS Code fork), not an AI coding agent, and writes no session transcripts.

One-shot ingestion

anjor watch-transcripts --providers claude,gemini   # specific agents
anjor watch-transcripts                             # auto-detect all

Real-time watching (standalone)

Keep the watcher running in the background — it polls every 2 seconds:

anjor watch-transcripts --providers claude          # Claude Code sessions
anjor watch-transcripts --poll-interval 5.0         # custom interval

Or use anjor start --watch-transcripts to run the collector and watcher together in one process.

List detected agents

anjor watch-transcripts --list-providers

MCP Server Support

MCP tools are automatically identified by their naming convention — no extra configuration needed. Any tool whose name follows mcp__<server>__<tool> is grouped by server in the MCP dashboard:

mcp__github__create_pull_request   →  server: github,     tool: create_pull_request
mcp__filesystem__read_file         →  server: filesystem, tool: read_file
mcp__brave_search__web_search      →  server: brave_search, tool: web_search

The /mcp endpoint returns per-server and per-tool aggregates and supports a ?days=N filter.


API endpoints

Method Path Description
POST /events Ingest a tool/LLM/span event
POST /flush Force-flush pending batch writes; returns {"flushed": N}
GET /tools All tools with summary stats (?since_minutes=N, ?project=)
GET /tools/{name} Tool detail (latency percentiles, drift) (?since_minutes=N)
GET /mcp MCP server and tool aggregates (?days=N)
GET /llm LLM call summary by model (?days=N, ?since_minutes=N, ?project=)
GET /llm/usage/daily Daily token usage by model (?days=N, ?project=)
GET /calls Paginated raw event log
GET /traces Trace list (newest first)
GET /traces/{id}/graph DAG graph for a single trace
GET /sessions Sessions list (?archived=true/false); on by default
GET /sessions/{id}/replay Chronological turn timeline (messages + tool calls)
POST /sessions/{id}/archive Archive a session
POST /sessions/{id}/unarchive Restore an archived session
DELETE /sessions/{id} Permanently delete a session and all its events
PATCH /sessions/{id}/project Re-tag a session's project (propagates to all events)
GET /health Uptime, queue depth, db path
GET /intelligence/failures Failure clusters sorted by rate
GET /intelligence/optimization Token hog tools + savings estimates
GET /intelligence/quality/tools Per-tool quality scores + grade
GET /intelligence/quality/runs Per-trace run quality scores + grade
GET /intelligence/attribution Per-agent token and failure attribution

Configuration

Via environment variables:

ANJOR_DB_PATH=./my_project.db python my_agent.py
ANJOR_BATCH_SIZE=1 ANJOR_BATCH_INTERVAL_MS=100 python my_agent.py
ANJOR_LOG_LEVEL=DEBUG python my_agent.py

Via .anjor.toml in your project root:

db_path = "my_project.db"
batch_size = 10
batch_interval_ms = 200
log_level = "DEBUG"

# OTel export — ship spans to any OTel-compatible endpoint (Jaeger, Tempo, Datadog Agent)
[export]
otlp_endpoint = "http://localhost:4318"
# otlp_headers = { "x-api-key" = "..." }   # optional auth

# Conversation capture — on by default; stores first 500 chars of each turn locally
# capture_messages = false   # uncomment to disable

Via code:

import anjor
from anjor.core.config import AnjorConfig

anjor.patch(config=AnjorConfig(db_path="my_project.db", batch_size=1))

Programmatic Access

Query your agent's history directly from Python — no running collector required:

import anjor
from anjor.models import ToolSummary, FailurePattern, ToolQualityScore

with anjor.Client("anjor.db") as client:
    # Per-tool summary stats
    for tool in client.tools():
        print(f"{tool.tool_name:30s}  calls={tool.call_count}  ok={tool.success_rate:.0%}")

    # Single tool detail (latency percentiles)
    t = client.tool("web_search")
    if t:
        print(f"p95={t.p95_latency_ms:.0f}ms  p99={t.p99_latency_ms:.0f}ms")

    # Raw call records (filterable)
    failures = client.calls(status="failure", limit=20)

    # Intelligence layer
    patterns  = client.intelligence.failures()      # list[FailurePattern]
    quality   = client.intelligence.quality()       # list[ToolQualityScore]
    runs      = client.intelligence.run_quality()   # list[RunQualityScore]
    opts      = client.intelligence.optimization()  # list[OptimizationSuggestion]

All return types are frozen Pydantic models importable from anjor.models:

from anjor.models import (
    ToolSummary,
    ToolCallRecord,
    FailurePattern,
    OptimizationSuggestion,
    ToolQualityScore,
    RunQualityScore,
)

The Client reads SQLite directly — no HTTP calls, no collector process needed. It opens the connection lazily on first query and is safe to use as a context manager.


Limitations

  • Streaming calls — captured only when the response stream is fully consumed. If your agent reads the stream partially or exits early (e.g. stops iterating a generator before the final chunk), that call is not recorded.
  • Quality scores — computed from three measurable signals: reliability (failure rate), schema stability (drift rate), and latency consistency (coefficient of variation). They use fixed weights, not ML. They surface patterns worth investigating; they don't identify root causes.
  • Cost estimates — the price table in the dashboard is maintained manually and will drift as providers update their pricing. Token counts from transcripts are exact; dollar figures are approximate.
  • No cloud sync, authentication, or team features — Anjor is local-only. All data stays on your machine.

Development

git clone https://github.com/anjor-labs/anjor.git
cd anjor
pip install -e ".[dev]"
pytest --cov=anjor --cov-fail-under=95 -q
ruff check anjor/ tests/
mypy anjor/
anjor start

See CONTRIBUTING.md for full guidelines.


Documentation


Contributing & Contact

License

MIT © Anjor Labs

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

anjor-1.2.0.tar.gz (151.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

anjor-1.2.0-py3-none-any.whl (196.4 kB view details)

Uploaded Python 3

File details

Details for the file anjor-1.2.0.tar.gz.

File metadata

  • Download URL: anjor-1.2.0.tar.gz
  • Upload date:
  • Size: 151.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for anjor-1.2.0.tar.gz
Algorithm Hash digest
SHA256 c3974769977db550d078c4279d5499252b9a27e4a1be371b60712556d0842467
MD5 0f928be0303af7c57e7b79358c0ac3c2
BLAKE2b-256 55ccd72b0fab229f987a37fa9b0c13571c25daf2ccc1f3821ed2bd20247de92a

See more details on using hashes here.

Provenance

The following attestation bundles were made for anjor-1.2.0.tar.gz:

Publisher: publish.yml on anjor-labs/anjor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file anjor-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: anjor-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 196.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for anjor-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 de18ad95962a5b4aab1c509859ae92d2dd6bf44477bcc1a0ebdd3d55a52a4796
MD5 cca0979cc2b9f48b99ee3b3f956cda99
BLAKE2b-256 84b67ca1e1c1fab0af8f8b0c86a44e95c3d05d241ad3aebb3de89ca11f644178

See more details on using hashes here.

Provenance

The following attestation bundles were made for anjor-1.2.0-py3-none-any.whl:

Publisher: publish.yml on anjor-labs/anjor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page