CLI for tracking AI agent task metrics: token cost, retry pressure, and outcome quality.
Project description
ai-agents-metrics
Measure whether AI-assisted development is working.
AI is writing more of your code. You still don't know:
- How many attempts each task actually takes
- Where the process breaks down and why
- Whether your workflow is getting faster or generating more rework
ai-agents-metrics tracks goals, attempts, outcomes, and token cost for every AI coding session — structured data to reason about your workflow, not just your model.
Why this exists
AI coding tools optimize for code generation. That is not the same as optimizing the development system around AI.
A coding agent can succeed at the individual task while the overall workflow degrades — more attempts per goal, more correction passes, more cost per shipped unit.
This project tracks the signals that matter at the workflow level:
- how many attempts goals require,
- where retries and failures cluster,
- whether outcomes are matching the requested result,
- whether cost is trending in the right direction.
It is not a benchmark, an eval framework, or a model comparison tool. It is a local ledger for real engineering work done with AI.
Core Concepts
| Concept | Meaning |
|---|---|
| goal | One requested outcome. Stored in the event log; task is a legacy alias used in CLI flags. |
| attempt | One implementation pass or retry for a goal. Multiple attempts per goal are normal when corrections are needed. |
| session | One continuous AI agent interaction (e.g. a single Claude Code or Codex thread). Maps to one or more attempts. |
| outcome | The final result of a closed goal: success or fail. |
| failure reason | The primary cause when an attempt does not succeed: model_mistake, unclear_task, validation_failed, environment_issue, scope_too_large, missing_context, tooling_issue, or other. |
| cost | Token spend mapped to USD for a goal or attempt. Sourced from local agent telemetry when available. |
| retry pressure | How many passes a goal required before closure. High retry pressure signals friction in the task or the workflow. |
| result fit | Quality label for closed product goals: exact_fit, partial_fit, or miss. Separate from outcome — a goal can succeed but still be a partial fit. |
What It Tracks
- Goals and attempts — what was requested, how many passes it took, and whether each pass succeeded
- Token cost — input, output, and cached-input tokens per session, mapped to USD
- Retry pressure — how often attempts fail or require correction
- Model usage — which model ran each session and what it cost
- Outcome quality — result-fit labels for product goals
- History analysis — reconstruct past sessions from agent conversation transcripts
Capabilities
| Capability | Status |
|---|---|
| Append-only local event log (NDJSON) | Available |
| Goal and attempt lifecycle CLI | Available |
| Retry and failure visibility | Available |
| Cost and token tracking | Available |
| Automatic cost sync from Claude Code telemetry | Available |
| Automatic cost sync from Codex telemetry | Available |
| History ingestion from agent transcripts | Available |
| Before/after workflow comparison | Available |
| Shell completion (bash, zsh) | Available |
| Standalone binary packaging | Available |
Repository bootstrap (bootstrap command) |
Available |
| Optional markdown report export | Available |
| Hosted multi-user dashboards | Not planned |
| Centralized team analytics | Not planned |
Quick Start
pip install ai-agents-metrics
# Bootstrap tracking into a repository
ai-agents-metrics bootstrap --target-dir /path/to/repo
# Start a goal, do the work, close it
ai-agents-metrics start-task --title "add login endpoint" --task-type product
ai-agents-metrics finish-task --task-id 2026-04-09-001 --status success --result-fit exact_fit
# See what it cost and how many tries it took
ai-agents-metrics show
Example Output
$ ai-agents-metrics show
Codex Metrics Summary
Operational summary:
Closed goals: 8
Successes: 8
Fails: 0
Total attempts: 8
Success Rate: 100.00%
Attempts per Closed Goal: 1.00
Known total cost (USD): 9.27
Known total tokens: 26,337,605
input: 260
cached: 26,088,225
output: 44,883
Known Cost per Success (USD): 1.32
Known Cost per Success (Tokens): 3,762,515
Model coverage: 7/8 closed goals with an unambiguous model
By model:
claude-sonnet-4-6: 7 closed, 7 successes, 0 fails
Closed entries: 8
Entry successes: 8
Entry fails: 0
Entry Success Rate: 100.00%
Install
Install from PyPI:
pip install ai-agents-metrics
Install from source:
python -m pip install -e .
Install the standalone binary:
make package-standalone
./dist/standalone/ai-agents-metrics install-self
Bootstrap a Repository
Run once to scaffold ai-agents-metrics into any repository. Creates the event log, installs the policy document, and injects an agent instructions block:
ai-agents-metrics bootstrap --target-dir /path/to/repo --dry-run
ai-agents-metrics bootstrap --target-dir /path/to/repo
Safe to rerun on a partially initialized repository. Use --dry-run to preview what will be written without making changes.
Track a Session
Start a goal
Record a new goal before implementation begins:
ai-agents-metrics start-task --title "implement login endpoint" --task-type product
Goal types: product for delivery work, retro for retrospective writeups, meta for bookkeeping and tooling work.
Record a correction pass
If the first attempt needed correction, record the retry:
ai-agents-metrics continue-task --task-id 2026-04-08-001 --failure-reason model_mistake
Close the goal
When the goal is complete, close it with an outcome and optional quality label:
ai-agents-metrics finish-task --task-id 2026-04-08-001 --status success --result-fit exact_fit
ai-agents-metrics finish-task --task-id 2026-04-08-001 --status fail --failure-reason unclear_task
Ensure bookkeeping is in place
If work has already started without an active goal, use this to detect and create a recovery draft:
ai-agents-metrics ensure-active-task
Inspect Metrics
Print a summary of all goals, costs, and retry pressure:
ai-agents-metrics show
Audit goal history for likely misses, stale in-progress goals, and low cost coverage:
ai-agents-metrics history-audit
Explain missing cost coverage and check whether it is recoverable from local agent logs:
ai-agents-metrics audit-cost-coverage
Regenerate the optional markdown report:
ai-agents-metrics render-report
Sync Cost Data
Backfill token and cost data from local agent telemetry into existing goal records. Supports Claude Code and Codex automatically — no provider flag required:
ai-agents-metrics sync-usage
Analyze History
Reconstruct session history from local agent transcripts. Run the three pipeline stages in order:
ai-agents-metrics history-ingest
ai-agents-metrics history-normalize
ai-agents-metrics history-derive
For Claude Code sessions, add --source claude:
ai-agents-metrics history-ingest --source claude
Compare the structured event log against reconstructed history to find gaps:
ai-agents-metrics history-compare
Analyze before/after product metrics around each retrospective event:
ai-agents-metrics derive-retro-timeline
Privacy and Storage
All data stays local. ai-agents-metrics writes only to:
metrics/events.ndjson— the append-only event log (source of truth)docs/ai-agents-metrics.md— an optional markdown export (regenerated on demand).ai-agents-metrics/warehouse.db— a local SQLite cache used by the history pipeline
No data is sent to any remote service. The event log is a plain NDJSON file you can read, audit, and version-control yourself.
Verify Your Install
make verify
Runs lint, security scan, typecheck, tests, and the public boundary check.
Public Boundary
This repository contains the public-safe core only. Private retrospectives, internal audits, and local metrics history are kept in a separate private overlay. The boundary is enforced automatically:
make verify-public-boundary
Repository
github.com/sg4tech/ai-agents-metrics
Contributing
Read CONTRIBUTING.md. In short: keep changes public-safe, run make verify, include tests for behavior changes.
Security
See SECURITY.md for how to report potential private-data leaks or security issues.
Changelog
Notable public changes are tracked in CHANGELOG.md.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ai_agents_metrics-0.1.1.tar.gz.
File metadata
- Download URL: ai_agents_metrics-0.1.1.tar.gz
- Upload date:
- Size: 203.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
10747408000a9bbcd4a5a7b0deabe9a3e2636a51a5f03c41f312dc443d5b0ae4
|
|
| MD5 |
8457357ec31ffb75394ac19ba04e49e7
|
|
| BLAKE2b-256 |
75134e7396452d9ba47e31c80f37aa3808a3c7f654a6f0b015638bd4fa4cea47
|
Provenance
The following attestation bundles were made for ai_agents_metrics-0.1.1.tar.gz:
Publisher:
publish.yml on sg4tech/ai-agents-metrics
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ai_agents_metrics-0.1.1.tar.gz -
Subject digest:
10747408000a9bbcd4a5a7b0deabe9a3e2636a51a5f03c41f312dc443d5b0ae4 - Sigstore transparency entry: 1273639207
- Sigstore integration time:
-
Permalink:
sg4tech/ai-agents-metrics@e928107ae48939b533d61454f244e1985983da79 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/sg4tech
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@e928107ae48939b533d61454f244e1985983da79 -
Trigger Event:
push
-
Statement type:
File details
Details for the file ai_agents_metrics-0.1.1-py3-none-any.whl.
File metadata
- Download URL: ai_agents_metrics-0.1.1-py3-none-any.whl
- Upload date:
- Size: 118.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
13250035b8bd10996df607ffc355d83be9f6ec06556572770980ebdc9b1d4f00
|
|
| MD5 |
d9680813a06f0f54b61fbb86b29b50a9
|
|
| BLAKE2b-256 |
c6e4f6e3c72e66d5ded429ae21ef934cee0f05c37c6ab363acef3e03302ff0b8
|
Provenance
The following attestation bundles were made for ai_agents_metrics-0.1.1-py3-none-any.whl:
Publisher:
publish.yml on sg4tech/ai-agents-metrics
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ai_agents_metrics-0.1.1-py3-none-any.whl -
Subject digest:
13250035b8bd10996df607ffc355d83be9f6ec06556572770980ebdc9b1d4f00 - Sigstore transparency entry: 1273639362
- Sigstore integration time:
-
Permalink:
sg4tech/ai-agents-metrics@e928107ae48939b533d61454f244e1985983da79 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/sg4tech
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@e928107ae48939b533d61454f244e1985983da79 -
Trigger Event:
push
-
Statement type: