Local-first agent analytics with prompt diagnostics
Project description
AgentFluent
Local-first agent analytics with behavior-to-improvement diagnostics. The tools that exist tell you what your agent did — AgentFluent tells you how to make it better.
AI agents are in production at 57% of organizations, and quality is the single top barrier to deployment. When an agent misbehaves — wrong tool choice, retry loops, hallucinated outputs — developers iterate on prompts blind. Existing observability platforms show what happened: traces, latency, token counts. They don't tell you why the agent misbehaved or what in its configuration to change.
AgentFluent reads your local Claude Code and Claude Agent SDK session JSONL, extracts agent invocations and tool patterns, scores each agent's configuration against a best-practice rubric, and correlates observed behavior back to specific fixes — a prompt gap, a missing tool constraint, or a stale model selection. No cloud services, no API keys, no data leaves your machine.
Born from CodeFluent research that identified the agent-quality gap in 2026. See docs/AGENT_ANALYTICS_RESEARCH.md for additional market analysis.
How It Compares
The agent observability space is crowded — several tools capture what agents do. None diagnose why they misbehave or what to change from locally-persisted session data. In the table below, "What's missing" is what the tool does not do (not what it provides):
| Tool | What it measures | What's missing |
|---|---|---|
| Langfuse / LangSmith / Arize Phoenix | Production traces, latency, token counts, errors | Behavior-to-prompt diagnosis; local agent config audit |
| Braintrust / Galileo / DeepEval | LLM-as-judge scoring against rubrics | Requires cloud instrumentation and author-provided test sets; no local agent config audit |
| ccusage / claude-code-analytics / agents-observe | Usage stats, token counts, subagent trees | Quality scoring; actionable config recommendations |
| claude-code-otel | OpenTelemetry export of Claude Code sessions | Analysis itself — it's a bridge to other tools |
| Anthropic Console | Per-request cost, rate-limit tracking | Session-level diagnostics; agent config recommendations |
Where AgentFluent fits. AgentFluent reads the session JSONL your agent already produced, scores each agent's configuration against a best-practice rubric, and correlates observed behavior back to the specific config line that most likely explains it. It complements the tools above rather than replacing them — use Langfuse/Phoenix for production traces, Braintrust for test-set evals, ccusage for usage dashboards, and AgentFluent for what in the agent's config to change. The question "my Agent SDK agent ran 500 sessions last week — were any of them actually good, and how can I update my agent's configuration to make it better?" has no answer from the tools above. AgentFluent is built to answer it.
Why This Is Different
- Research-grounded. Every diagnostic maps to a specific gap in the agent's prompt, tool list, or model selection — not vibes. See the research doc for the feasibility and positioning analysis.
- Behavior-to-improvement, not just traces. When the agent retries Bash 40% of the time, AgentFluent tells you which prompt clause is missing — not just that the retry happened.
- The config is the agent. In interactive sessions, the human course-corrects. In programmatic agents, the prompt and tool setup are the agent — a flaw compounds at scale. AgentFluent scores four dimensions of that config today — description, tools (
allowed_tools/disallowedTools), model, and prompt — with hook, MCP, and cross-agent coverage on the roadmap. - Local-first and private. All analysis runs on your machine. Zero outbound network calls. No API key required.
- CLI-native.
agentfluent analyze --format json | jq ...— fits agent developer workflows (terminal, CI/CD, PR checks) without a web dashboard dependency. - JSON output envelope is a contract. A stable
{version, command, data}schema lets you build PR gates, trend dashboards, and regression detectors on top without tracking AgentFluent's internal refactors. - Correct cost accounting. Distinguishes pay-per-token API rate from subscription plan flat cost, with per-model pricing that AgentFluent actively maintains (#80 will add per-session historical pricing).
- CodeFluent sibling. Shares the JSONL parsing heritage but asks a different question. CodeFluent scores human AI fluency in interactive sessions; AgentFluent scores agent quality and tells you what configuration to change. Not forked — two products with a common data source.
AgentFluent vs CodeFluent
Both read ~/.claude/projects/ session JSONL. They answer different questions:
| CodeFluent | AgentFluent | |
|---|---|---|
| Unit of analysis | Conversations in interactive sessions, plus the supporting .claude/ config (CLAUDE.md, rules, hooks, commands) |
Agent definitions + their observed behavior |
| Scoring target | Developer's AI collaboration fluency and project-config maturity | Agent's prompt, tools, model, hooks |
| Feedback loop | Coaches the human to interact with Claude Code better | Tells the developer what config to change |
| Delivery | VS Code extension + web app | CLI-first (dashboard deferred) |
| API calls | Anthropic API for LLM-as-judge scoring | None — fully local |
If you write your own prompts each session, use CodeFluent. If your prompts live in ClaudeAgentOptions, AgentDefinition, or .claude/agents/*.md files, use AgentFluent.
Screenshots
| Project Discovery | Execution Analytics |
|---|---|
| Behavior Diagnostics | Config Assessment |
|---|---|
Getting Started
Prerequisites
- Python 3.12 or newer. Check with
python --version. - Claude Code or Agent SDK session data. Generated automatically at
~/.claude/projects/whenever you use Claude Code or run an Agent SDK script — nothing to configure. - Platforms: Linux, macOS, Windows. Pure-Python package; the path handling resolves
~/.claude/on every platform.
Install
# Preferred — isolated tool install via uv (https://docs.astral.sh/uv/)
uv tool install agentfluent
# Fallback — pip into a venv of your choice
pip install agentfluent
# Zero-install one-shot
uvx agentfluent list
First run
# Discover which projects have session data
agentfluent list
# Analyze agent behavior + cost in a specific project
agentfluent analyze --project myproject
# Score your agent definitions against the config rubric
agentfluent config-check
Commands
agentfluent list — discover projects and sessions
agentfluent list # All projects
agentfluent list --project codefluent # Sessions in one project
agentfluent list --format json | jq '.data.projects[].name'
Lists every Claude Code / Agent SDK project found under ~/.claude/projects/, with session counts, total size, and last-modified timestamp. Pass --project to drill into one project and list its individual session files.
agentfluent analyze — token, cost, and behavior metrics
agentfluent analyze --project codefluent # Full project analysis
agentfluent analyze --project codefluent --agent pm # Filter to one subagent
agentfluent analyze --project codefluent --latest 5 # Last 5 sessions only
agentfluent analyze --project codefluent --diagnostics # Show behavior diagnostics
agentfluent analyze --project codefluent --format json | jq '.data.token_metrics.total_cost'
Produces a token-usage table, per-model cost breakdown (labeled as API rate — subscription plans differ), tool usage concentration, and an Agent Invocations table summarizing each subagent's token, duration, and tool-use count. --diagnostics surfaces behavior signals (tool errors, token-per-tool-use outliers, duration outliers) with a pointer to the configuration gap most likely responsible.
Cost numbers reflect current per-token pricing; historical sessions are priced at today's rates until #80 (time-series pricing) lands.
agentfluent config-check — score agent definitions
agentfluent config-check # All user + project agents
agentfluent config-check --scope user # Only ~/.claude/agents/
agentfluent config-check --agent pm --verbose # One agent with detailed recs
agentfluent config-check --format json | jq '.data.scores[] | select(.overall_score < 60)'
Walks ~/.claude/agents/*.md and ./.claude/agents/*.md, parses each agent's YAML frontmatter and body, and scores against a 4-dimension rubric (description trigger quality, tool access appropriateness, model selection, prompt completeness). Outputs a score per agent plus ranked recommendations — e.g. "Prompt body doesn't mention error handling."
Configuration
AgentFluent's "configuration" is CLI flags — no config file, no environment variables beyond the defaults. Sensible defaults keep most invocations flagless.
| Flag | Default | What it controls |
|---|---|---|
--project |
(required on analyze) |
Filter to a specific project slug or display name |
--scope |
all |
config-check scope: user, project, or all |
--agent |
(none) | Filter analyze or config-check to one subagent type |
--latest N |
(all sessions) | analyze only the N most recent sessions |
--diagnostics |
off | analyze: show behavior-correlation signals |
--format |
table |
Output format: table (Rich) or json (envelope) |
--verbose |
off | Extra detail (per-session breakdown, per-invocation detail) |
--quiet |
off | Suppress non-essential output (useful in CI) |
Output formats
Default (table): Rich-rendered tables in the terminal, designed to be readable at a glance. Colors auto-adapt to terminal theme.
JSON envelope (--format json): Stable schema {version, command, data} intended as a contract — pipe to jq, integrate with CI, build regression gates on top. Example:
{
"version": 1,
"command": "analyze",
"data": {
"token_metrics": { "total_cost": 15.42, "total_tokens": 82940115, ... },
"by_model": { "claude-opus-4-7": {...}, "claude-sonnet-4-6": {...} },
"tool_usage": [...],
"agent_invocations": [...]
}
}
No ANSI escapes in JSON output, guaranteed. The key total_cost is the pay-per-token equivalent; subscribers on Pro/Max/Team/Enterprise plans see a flat monthly charge regardless.
How It Works
flowchart LR
subgraph Local["Local filesystem — nothing leaves this boundary"]
S["Session JSONL<br/>~/.claude/projects/"]
A["Agent definitions<br/>~/.claude/agents/<br/>.claude/agents/"]
end
S --> P[Parser]
P --> X[Agent Extractor]
P --> TM[Token & Cost<br/>Metrics]
P --> TU[Tool Usage<br/>Patterns]
A --> CS[Config Scanner]
CS --> SC[Config Scorer]
X --> D[Diagnostics<br/>Correlator]
TM --> D
TU --> D
SC --> D
D --> OUT["Rich tables<br/>or JSON envelope"]
TM --> OUT
TU --> OUT
SC --> OUT
Step by step:
- Parse JSONL —
core/parser.pyreads each session file into typedSessionMessageobjects. Handles streaming snapshot deduplication, plain-string vs. array content shapes, and Claude Code's realtoolUseResultformat (seeCLAUDE.mdfor the format spec). - Discover projects and sessions —
core/discovery.pyenumerates~/.claude/projects/and surfaces friendly display names. - Extract agent invocations —
agents/extractor.pywalks messages, pairs Agenttool_useblocks with theirtool_resultcontent blocks, and pulls per-invocation metadata (tokens, duration, tool-use count) from the containing user message'stoolUseResultsibling. - Compute token and cost metrics —
analytics/tokens.pyaggregates usage per model with<synthetic>sentinel filtering;analytics/pricing.pyapplies per-token rates labeled as API rate. - Score agent configurations —
config/scanner.pyparses YAML frontmatter from each.mdin.claude/agents/and~/.claude/agents/;config/scoring.pyscores description, tools, model, and prompt on a 4-dimension rubric. - Diagnose behavior —
diagnostics/signals.pycorrelates observed patterns (retry loops, tool errors, zero-invocation agents) with likely configuration root causes and attaches a recommendation. - Render —
cli/formatters/table.pyemits Rich tables;cli/formatters/json.pyemits the stable JSON envelope. Format is selected by--format.
Everything runs locally. No outbound network calls, ever. No API key needed.
Features
- Project and Session Discovery — Enumerates
~/.claude/projects/, groups sessions by project, shows per-project session count, total size, and last-modified timestamp. Handles Claude Code subagent sidechain files and Agent SDK sessions uniformly. - Execution Analytics — Token usage, API-rate cost, cache efficiency, per-model breakdown, tool-call concentration, and per-agent invocation metrics (tokens, duration, tool-use count). Cache creation and cache read tokens are tracked separately so you can see where your prompt caching is working.
- Agent Config Assessment — 4-dimension rubric (description, tools, model, prompt) applied to every
.mdfile in~/.claude/agents/and./.claude/agents/. Produces a 0–100 score plus ranked, specific recommendations ("Prompt body doesn't mention error handling"). Catches agents that are technically valid but miss well-known best practices. - Diagnostics Preview —
--diagnosticscorrelates three behavior signals to configuration gaps: tool errors (caught by keywords likeblocked,failed,error) suggesting missing error-handling instructions or over-broad tools; per-tool-use token outliers suggesting an agent that's exploring too broadly or needs a tighter prompt; duration outliers flagging unusually slow invocations. Each signal carries a severity level and a specific recommendation. - JSON Output Envelope — Stable
{version, command, data}schema. No ANSI escapes. Intended as a programmatic contract for CI integration, PR gates, and regression tracking. - Quiet and Verbose Modes —
--quietfor CI-friendly one-line summaries;--verbosefor per-session breakdown and per-invocation detail tables. Defaults target interactive humans.
Privacy and Security
AgentFluent is designed so data stays on your machine. The attack surface is small by construction — no web server, no HTML rendering, no webview, no outbound network calls — but this table summarizes the layers that protect it anyway:
| Layer | Mechanism | Protects Against |
|---|---|---|
| Zero network calls | No outbound connections — all analysis is local | Data exfiltration |
| Path handling | All paths resolved within ~/.claude/ |
Path traversal |
| Input validation | Pydantic models with strict type constraints | Malformed JSONL crashing the parser |
| Safe YAML loading | yaml.safe_load only |
Arbitrary code execution via frontmatter |
| CI security review | Claude-powered review on every PR | New vulnerabilities |
| Automated testing | 270+ unit tests incl. security-focused cases | Regressions |
Secrets handling
Claude Code persists every tool output to ~/.claude/projects/<slug>/*.jsonl — including any .env, credentials.json, or shell rc file that Claude ever read. .gitignore does not protect against this. AgentFluent itself emits only aggregate metrics, so it cannot leak secrets that weren't already on disk — but because the tool reads that data, contributors working on AgentFluent risk re-leaking while they work.
This repo ships two Claude Code hooks in .claude/settings.json to reduce that risk:
- PreToolUse block (
.claude/hooks/block_secret_reads.py) — denies reads of.env*,.envrc,credentials.json,secrets.{yaml,yml,json},*.pem, SSH private keys, and shell rc files. Blocks before execution, so the file's contents never enter the session transcript. - PostToolUse detect (
.claude/hooks/detect_secrets_in_output.py) — scans tool output forsk-ant-*,sk-proj-*,ghp_*,github_pat_*,AKIA*, orAIza*patterns. If a match is found, blocks Claude from echoing or summarizing it. The raw value is already on disk at this point, so treat any caught value as compromised and rotate.
Any future AgentFluent feature that surfaces raw session content (diff viewers, prompt excerpts, recommendation snippets that quote session text) must re-apply secret-pattern redaction at the display layer — historical JSONL on users' machines may still contain pre-hook leaks.
See docs/SECURITY.md for the full policy: leak vector, defense architecture, discipline rules, historical-leak audit one-liner, user-scope deployment, and the bypass surface the hooks do not cover.
Tech Stack
- Python 3.12+
- Typer + Rich — CLI framework and terminal formatting
- Pydantic v2 — data models across module boundaries
- PyYAML — agent definition frontmatter parsing (
safe_loadonly) - pytest + pytest-cov — 270+ tests
- mypy strict mode — full type coverage
- ruff — linting and formatting
- uv — package and dependency management
Project Structure
src/agentfluent/
├── cli/ # Typer app, commands, formatters (table + JSON envelope)
├── core/ # JSONL parser, session models, project/session discovery
├── agents/ # Agent invocation extraction and AgentInvocation model
├── analytics/ # Token/cost metrics, tool patterns, model pricing
├── config/ # Agent definition scanner and scoring rubric
└── diagnostics/ # Behavior signals, correlation, recommendations
Full architecture and conventions are documented in CLAUDE.md.
Development
git clone https://github.com/frederick-douglas-pearce/agentfluent.git
cd agentfluent
uv sync
uv run agentfluent --help
Testing
uv run pytest -m "not integration" # 270+ unit tests (CI default)
uv run pytest # Full suite incl. integration tests against your real ~/.claude/projects/
uv run pytest --cov=agentfluent # With coverage
Integration tests (tests/integration/) are skipped in CI because they require real session data — they pass on contributor machines with populated ~/.claude/projects/.
Lint and type check
uv run ruff check src/ tests/
uv run mypy src/agentfluent/
Both must pass cleanly before a PR merges.
CI/CD
Five GitHub Actions workflows run automatically:
- CI (
ci.yml) — Every PR: ruff, mypy strict, full unit-test suite. Must pass to merge. - Security Review (
security-review.yml) — Claude-powered security review of code-changing PRs (markdown and image changes skip it). - Claude Code Review (
claude-review.yml) — AI-powered PR review, triggered by theneeds-reviewlabel or@claudementions. - Release Please (
release-please.yml) — Auto-generates release PRs with changelog and version bumps from Conventional Commits. - Dependabot Auto-Merge (
dependabot-auto-merge.yml) — Auto-merges dependabot PRs once CI passes.
Roadmap
v0.2 (next release):
- Parser fix for real Claude Code
toolUseResultshape (#84 — merged) - Cost label clarity for subscription-plan users (#76 — merged)
- Pricing data correction + opus-4-7 + synthetic filter (#75 — merged)
v0.3+:
- Time-series pricing data structure (#80)
- Session-timestamp-aware cost calculation (#81)
- Automated pricing-update service (#82)
--claude-config-dirflag for non-default session paths (#90)- Delegation pattern recognition — cluster
general-purposeinvocations and recommend custom subagents (#92) - Deeper diagnostics with per-tool-call evidence
- Subagent trace parsing (
~/.claude/projects/<session>/subagents/) - Prompt regression detection across agent config versions
- Retry-pattern and zero-invocation-agent signals (complete the diagnostics surface currently covering tool errors and outliers)
- Hook and MCP-server coverage in the config rubric
Future:
- Webapp dashboard for trend visualization
agentfluent diff— side-by-side comparison of behavior before/after a prompt change- MCP server configuration assessment
- Closed-loop self-improvement — use AgentFluent's diagnostic output as a feedback signal the agent itself consumes to propose config edits against its own past sessions
- Agent ROI reporting — roll up cost, usage, and task-completion signals over time so a business can evaluate whether an optimized agent is worth continuing to run
Browse open issues for the full backlog.
Troubleshooting
| Problem | Solution |
|---|---|
| No projects found | Verify ~/.claude/projects/ exists and contains per-project subdirectories with .jsonl session files. Claude Code creates these automatically the first time you use it. |
| No agent invocations | Agent invocation rows require the session to actually call a subagent (Agent tool_use with a subagent_type). A session that never delegated has no agent data to analyze — this is not an error. |
| Zero tokens / dashes in Agent Invocations | If you're on AgentFluent ≤ 0.1.0, this is the #84 parser bug — upgrade with uv tool upgrade agentfluent. |
| Python version error | AgentFluent requires Python 3.12+. Check with python --version and upgrade if needed. |
| Non-default session path | If ~/.claude/ is stored somewhere unusual, AgentFluent currently uses the default path only. Custom path support is planned. |
Malformed JSON at <file>:<line> warning |
A session file has a corrupted line — usually null bytes left behind when Claude Code was killed mid-write. The parser skips the line and continues; analytics are unaffected. Safe to ignore, or delete the line with sed -i '<line>d' <file> to silence the warning. |
| Stale tool install after local build | If uv tool install --from <path> agentfluent seems to reuse cached code, run uv tool uninstall agentfluent && uv cache clean agentfluent before reinstalling. |
Research Foundations
AgentFluent's behavior-to-improvement approach is grounded in research on agent quality, observability gaps, and production failure modes:
docs/AGENT_ANALYTICS_RESEARCH.md— Full market analysis, competitive landscape (Langfuse, LangSmith, Arize, Braintrust, DeepEval, etc.), and technical feasibility study. This is the document that motivated AgentFluent's existence as a separate product from CodeFluent.- LangChain 2026 State of AI Agents — 57% of orgs have agents in production; quality is the top blocker.
- Anthropic Claude Agent SDK docs — Agent configuration surface and best practices.
- Anthropic Claude Code subagents docs — Subagent definition format and delegation mechanics.
Contributing
Contributions welcome. Start by reading CONTRIBUTING.md for dev setup, conventions, and the PR checklist. The architecture overview in CLAUDE.md is the canonical reference for package layout, naming, and the JSONL format.
Branching: feature/<issue>-description for features, fix/<issue>-description for bugs. Commit messages follow Conventional Commits — release-please uses them to cut versions and write the changelog automatically.
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agentfluent-0.2.0.tar.gz.
File metadata
- Download URL: agentfluent-0.2.0.tar.gz
- Upload date:
- Size: 41.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2ff6c74f50cefd1f29399afe95900318793612a1b15a16367428ab7e4cf6439b
|
|
| MD5 |
8a6ec0d6cc8a50e8a416730873f5b7da
|
|
| BLAKE2b-256 |
63208d5b2bb9c1cc3be5d803d8fc1011939c58e359a2a121096ac726c1bc0008
|
Provenance
The following attestation bundles were made for agentfluent-0.2.0.tar.gz:
Publisher:
release-please.yml on frederick-douglas-pearce/agentfluent
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
agentfluent-0.2.0.tar.gz -
Subject digest:
2ff6c74f50cefd1f29399afe95900318793612a1b15a16367428ab7e4cf6439b - Sigstore transparency entry: 1341809166
- Sigstore integration time:
-
Permalink:
frederick-douglas-pearce/agentfluent@4ee5683ee59d2f75129a095db02425bf4d6d2a4a -
Branch / Tag:
refs/heads/main - Owner: https://github.com/frederick-douglas-pearce
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release-please.yml@4ee5683ee59d2f75129a095db02425bf4d6d2a4a -
Trigger Event:
push
-
Statement type:
File details
Details for the file agentfluent-0.2.0-py3-none-any.whl.
File metadata
- Download URL: agentfluent-0.2.0-py3-none-any.whl
- Upload date:
- Size: 55.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c0ef9a70b1e39b55b74651d68480dcfc0ece9f9a253b0b33a423be73928f809d
|
|
| MD5 |
08a2955d725d69f07ed325ff7335bc22
|
|
| BLAKE2b-256 |
5951419694d851807b1000b4e0d5cc130103baeb644125eac99468c8823ebb96
|
Provenance
The following attestation bundles were made for agentfluent-0.2.0-py3-none-any.whl:
Publisher:
release-please.yml on frederick-douglas-pearce/agentfluent
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
agentfluent-0.2.0-py3-none-any.whl -
Subject digest:
c0ef9a70b1e39b55b74651d68480dcfc0ece9f9a253b0b33a423be73928f809d - Sigstore transparency entry: 1341809176
- Sigstore integration time:
-
Permalink:
frederick-douglas-pearce/agentfluent@4ee5683ee59d2f75129a095db02425bf4d6d2a4a -
Branch / Tag:
refs/heads/main - Owner: https://github.com/frederick-douglas-pearce
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release-please.yml@4ee5683ee59d2f75129a095db02425bf4d6d2a4a -
Trigger Event:
push
-
Statement type: