CLI for tracking AI agent task metrics: token cost, retry pressure, and outcome quality.

These details have not been verified by PyPI

Project description

ai-agents-metrics

Measure whether AI-assisted development is working.

AI is writing more of your code. You still don't know:

How many attempts each task actually takes
Where the process breaks down and why
Whether your workflow is getting faster or generating more rework

ai-agents-metrics tracks goals, attempts, outcomes, and token cost for every AI coding session — structured data to reason about your workflow, not just your model.

Why this exists

AI coding tools optimize for code generation. That is not the same as optimizing the development system around AI.

A coding agent can succeed at the individual task while the overall workflow degrades — more attempts per goal, more correction passes, more cost per shipped unit.

This project tracks the signals that matter at the workflow level:

how many attempts goals require,
where retries and failures cluster,
whether outcomes are matching the requested result,
whether cost is trending in the right direction.

It is not a benchmark, an eval framework, or a model comparison tool. It is a local ledger for real engineering work done with AI.

Core Concepts

Concept	Meaning
goal	One requested outcome. Stored in the event log; `task` is a legacy alias used in CLI flags.
attempt	One implementation pass or retry for a goal. Multiple attempts per goal are normal when corrections are needed.
session	One continuous AI agent interaction (e.g. a single Claude Code or Codex thread). Maps to one or more attempts.
outcome	The final result of a closed goal: `success` or `fail`.
failure reason	The primary cause when an attempt does not succeed: `model_mistake`, `unclear_task`, `validation_failed`, `environment_issue`, `scope_too_large`, `missing_context`, `tooling_issue`, or `other`.
cost	Token spend mapped to USD for a goal or attempt. Sourced from local agent telemetry when available.
retry pressure	How many passes a goal required before closure. High retry pressure signals friction in the task or the workflow.
result fit	Quality label for closed product goals: `exact_fit`, `partial_fit`, or `miss`. Separate from outcome — a goal can succeed but still be a partial fit.

What It Tracks

Goals and attempts — what was requested, how many passes it took, and whether each pass succeeded
Token cost — input, output, and cached-input tokens per session, mapped to USD
Retry pressure — how often attempts fail or require correction
Model usage — which model ran each session and what it cost
Outcome quality — result-fit labels for product goals
History analysis — reconstruct past sessions from agent conversation transcripts

Capabilities

Capability	Status
Append-only local event log (NDJSON)	Available
Goal and attempt lifecycle CLI	Available
Retry and failure visibility	Available
Cost and token tracking	Available
Automatic cost sync from Claude Code telemetry	Available
Automatic cost sync from Codex telemetry	Available
History ingestion from agent transcripts	Available
Before/after workflow comparison	Available
Shell completion (bash, zsh)	Available
Standalone binary packaging	Available
Repository bootstrap (`bootstrap` command)	Available
Optional markdown report export	Available
Hosted multi-user dashboards	Not planned
Centralized team analytics	Not planned

Quick Start

pip install ai-agents-metrics

# Bootstrap tracking into a repository
ai-agents-metrics bootstrap --target-dir /path/to/repo

# Start a goal, do the work, close it
ai-agents-metrics start-task --title "add login endpoint" --task-type product
ai-agents-metrics finish-task --task-id 2026-04-09-001 --status success --result-fit exact_fit

# See what it cost and how many tries it took
ai-agents-metrics show

Example Output

$ ai-agents-metrics show

Codex Metrics Summary

Operational summary:
Closed goals:                    8
Successes:                       8
Fails:                           0
Total attempts:                  8
Success Rate:                    100.00%
Attempts per Closed Goal:        1.00

Known total cost (USD):          9.27
Known total tokens:              26,337,605
  input:                         260
  cached:                        26,088,225
  output:                        44,883

Known Cost per Success (USD):    1.32
Known Cost per Success (Tokens): 3,762,515

Model coverage: 7/8 closed goals with an unambiguous model
By model:
  claude-sonnet-4-6: 7 closed, 7 successes, 0 fails

Closed entries:     8
Entry successes:    8
Entry fails:        0
Entry Success Rate: 100.00%

Install

Install from PyPI:

pip install ai-agents-metrics

Install from source:

python -m pip install -e .

Install the standalone binary:

make package-standalone
./dist/standalone/ai-agents-metrics install-self

Bootstrap a Repository

Run once to scaffold ai-agents-metrics into any repository. Creates the event log, installs the policy document, and injects an agent instructions block:

ai-agents-metrics bootstrap --target-dir /path/to/repo --dry-run
ai-agents-metrics bootstrap --target-dir /path/to/repo

Safe to rerun on a partially initialized repository. Use --dry-run to preview what will be written without making changes.

Track a Session

Start a goal

Record a new goal before implementation begins:

ai-agents-metrics start-task --title "implement login endpoint" --task-type product

Goal types: product for delivery work, retro for retrospective writeups, meta for bookkeeping and tooling work.

Record a correction pass

If the first attempt needed correction, record the retry:

ai-agents-metrics continue-task --task-id 2026-04-08-001 --failure-reason model_mistake

Close the goal

When the goal is complete, close it with an outcome and optional quality label:

ai-agents-metrics finish-task --task-id 2026-04-08-001 --status success --result-fit exact_fit
ai-agents-metrics finish-task --task-id 2026-04-08-001 --status fail --failure-reason unclear_task

Ensure bookkeeping is in place

If work has already started without an active goal, use this to detect and create a recovery draft:

ai-agents-metrics ensure-active-task

Inspect Metrics

Print a summary of all goals, costs, and retry pressure:

ai-agents-metrics show

Audit goal history for likely misses, stale in-progress goals, and low cost coverage:

ai-agents-metrics history-audit

Explain missing cost coverage and check whether it is recoverable from local agent logs:

ai-agents-metrics audit-cost-coverage

Regenerate the optional markdown report:

ai-agents-metrics render-report

Sync Cost Data

Backfill token and cost data from local agent telemetry into existing goal records. Supports Claude Code and Codex automatically — no provider flag required:

ai-agents-metrics sync-usage

Analyze History

Reconstruct session history from local agent transcripts. Run the three pipeline stages in order:

ai-agents-metrics history-ingest
ai-agents-metrics history-normalize
ai-agents-metrics history-derive

For Claude Code sessions, add --source claude:

ai-agents-metrics history-ingest --source claude

Compare the structured event log against reconstructed history to find gaps:

ai-agents-metrics history-compare

Analyze before/after product metrics around each retrospective event:

ai-agents-metrics derive-retro-timeline

Privacy and Storage

All data stays local. ai-agents-metrics writes only to:

metrics/events.ndjson — the append-only event log (source of truth)
docs/ai-agents-metrics.md — an optional markdown export (regenerated on demand)
.ai-agents-metrics/warehouse.db — a local SQLite cache used by the history pipeline

No data is sent to any remote service. The event log is a plain NDJSON file you can read, audit, and version-control yourself.

Verify Your Install

make verify

Runs lint, security scan, typecheck, tests, and the public boundary check.

Public Boundary

This repository contains the public-safe core only. Private retrospectives, internal audits, and local metrics history are kept in a separate private overlay. The boundary is enforced automatically:

make verify-public-boundary

Repository

github.com/sg4tech/ai-agents-metrics

Contributing

Read CONTRIBUTING.md. In short: keep changes public-safe, run make verify, include tests for behavior changes.

Security

See SECURITY.md for how to report potential private-data leaks or security issues.

Changelog

Notable public changes are tracked in CHANGELOG.md.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.2.2

Apr 21, 2026

0.2.1

Apr 20, 2026

0.2.0

Apr 20, 2026

0.1.5

Apr 13, 2026

0.1.4

Apr 12, 2026

0.1.3

Apr 12, 2026

0.1.2

Apr 12, 2026

This version

0.1.1

Apr 10, 2026

0.1.0

Apr 9, 2026

0.0.0.dev0 pre-release

Apr 9, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ai_agents_metrics-0.1.1.tar.gz (203.2 kB view details)

Uploaded Apr 10, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ai_agents_metrics-0.1.1-py3-none-any.whl (118.2 kB view details)

Uploaded Apr 10, 2026 Python 3

File details

Details for the file ai_agents_metrics-0.1.1.tar.gz.

File metadata

Download URL: ai_agents_metrics-0.1.1.tar.gz
Upload date: Apr 10, 2026
Size: 203.2 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ai_agents_metrics-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`10747408000a9bbcd4a5a7b0deabe9a3e2636a51a5f03c41f312dc443d5b0ae4`
MD5	`8457357ec31ffb75394ac19ba04e49e7`
BLAKE2b-256	`75134e7396452d9ba47e31c80f37aa3808a3c7f654a6f0b015638bd4fa4cea47`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ai_agents_metrics-0.1.1.tar.gz:

Publisher: publish.yml on sg4tech/ai-agents-metrics

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ai_agents_metrics-0.1.1.tar.gz
- Subject digest: 10747408000a9bbcd4a5a7b0deabe9a3e2636a51a5f03c41f312dc443d5b0ae4
- Sigstore transparency entry: 1273639207
- Sigstore integration time: Apr 10, 2026
Source repository:
- Permalink: sg4tech/ai-agents-metrics@e928107ae48939b533d61454f244e1985983da79
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/sg4tech
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@e928107ae48939b533d61454f244e1985983da79
- Trigger Event: push

File details

Details for the file ai_agents_metrics-0.1.1-py3-none-any.whl.

File metadata

Download URL: ai_agents_metrics-0.1.1-py3-none-any.whl
Upload date: Apr 10, 2026
Size: 118.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ai_agents_metrics-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`13250035b8bd10996df607ffc355d83be9f6ec06556572770980ebdc9b1d4f00`
MD5	`d9680813a06f0f54b61fbb86b29b50a9`
BLAKE2b-256	`c6e4f6e3c72e66d5ded429ae21ef934cee0f05c37c6ab363acef3e03302ff0b8`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ai_agents_metrics-0.1.1-py3-none-any.whl:

Publisher: publish.yml on sg4tech/ai-agents-metrics

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ai_agents_metrics-0.1.1-py3-none-any.whl
- Subject digest: 13250035b8bd10996df607ffc355d83be9f6ec06556572770980ebdc9b1d4f00
- Sigstore transparency entry: 1273639362
- Sigstore integration time: Apr 10, 2026
Source repository:
- Permalink: sg4tech/ai-agents-metrics@e928107ae48939b533d61454f244e1985983da79
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/sg4tech
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@e928107ae48939b533d61454f244e1985983da79
- Trigger Event: push

ai-agents-metrics 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

ai-agents-metrics

Why this exists

Core Concepts

What It Tracks

Capabilities

Quick Start

Example Output

Install

Bootstrap a Repository

Track a Session

Start a goal

Record a correction pass

Close the goal

Ensure bookkeeping is in place

Inspect Metrics

Sync Cost Data

Analyze History

Privacy and Storage

Verify Your Install

Public Boundary

Repository

Contributing

Security

Changelog

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance