Skip to main content

CLI for tracking AI agent task metrics: token cost, retry pressure, and outcome quality.

Project description

ai-agents-metrics — track AI agent token cost and retry pressure

CI PyPI

Measure the real cost and effectiveness of AI-assisted engineering work.

ai-agents-metrics is a local CLI tool that records goals, attempts, token spend, and retry patterns for every AI coding session — so you can see which workflows are productive and which are burning tokens on rework.

Why

AI coding agents (Claude Code, Codex, and similar) generate real costs and vary widely in effectiveness. Common questions without this tool:

  • "How much did my Claude Code session cost?"
  • "How do I track AI agent retries across tasks?"
  • "What is my token spend per task?"
  • "Did this workflow change actually improve anything?"
  • "Which model is more cost-effective for my work?"

ai-agents-metrics gives you a lightweight, local ledger to answer all of these from real data.

When to use this

  • You use Claude Code, Codex, or another AI coding agent and want to know what each task actually cost
  • You suspect certain types of tasks require too many correction passes and want the numbers to confirm it
  • You changed a prompt strategy or workflow and want to verify it improved outcome quality or reduced cost
  • You run AI agents as part of a paid engineering workflow and need to track whether AI cost is eating into project margins
  • You want an AI agent to analyze your workflow history and recommend what to change next

What It Tracks

  • Goals and attempts — what you asked the agent to do, how many tries it took
  • Token cost — input, output, and cached-input tokens per session, mapped to USD
  • Retry pressure — how often attempts fail or require correction
  • Model usage — which model ran each session and what it cost
  • History analysis — parse conversation transcripts to reconstruct past sessions

Example output

$ ai-agents-metrics show

Codex Metrics Summary

Operational summary:
Closed goals:                    8
Successes:                       8
Fails:                           0
Total attempts:                  8
Success Rate:                    100.00%
Attempts per Closed Goal:        1.00

Known total cost (USD):          9.27
Known total tokens:              26,337,605
  input:                         260
  cached:                        26,088,225
  output:                        44,883

Known Cost per Success (USD):    1.32
Known Cost per Success (Tokens): 3,762,515

Model coverage: 7/8 closed goals with an unambiguous model
By model:
  claude-sonnet-4-6: 7 closed, 7 successes, 0 fails

Closed entries:     8
Entry successes:    8
Entry fails:        0
Entry Success Rate: 100.00%

Install

python -m pip install -e .

Or install the standalone binary:

make package-standalone
./dist/standalone/codex-metrics install-self

Quick Start

Bootstrap a project:

ai-agents-metrics bootstrap

Start tracking a goal:

ai-agents-metrics start-task --title "implement login endpoint" --task-type product

Record another attempt if the agent needed a correction:

ai-agents-metrics continue-task --task-id 2026-04-08-001 --failure-reason wrong_scope

Close it when done:

ai-agents-metrics finish-task --task-id 2026-04-08-001 --outcome success --result-fit exact_fit

Show current metrics:

ai-agents-metrics show

Verify Your Install

make verify

Runs lint, security scan, typecheck, tests, and the public boundary check.

Public Boundary

This repository contains the public-safe core only. Private retrospectives, internal audits, and local metrics history are kept in a separate private overlay. The boundary is enforced automatically:

make verify-public-boundary

Repository

github.com/sg4tech/codex-metrics-public

Contributing

Read CONTRIBUTING.md. In short: keep changes public-safe, run make verify, include tests for behavior changes.

Security

See SECURITY.md for how to report potential private-data leaks or security issues.

Changelog

Notable public changes are tracked in CHANGELOG.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ai_agents_metrics-0.1.0.tar.gz (185.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ai_agents_metrics-0.1.0-py3-none-any.whl (110.8 kB view details)

Uploaded Python 3

File details

Details for the file ai_agents_metrics-0.1.0.tar.gz.

File metadata

  • Download URL: ai_agents_metrics-0.1.0.tar.gz
  • Upload date:
  • Size: 185.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ai_agents_metrics-0.1.0.tar.gz
Algorithm Hash digest
SHA256 bbdcf62de33138c715542d2f42fcd128f8e0e1a0c9e114df32a4e382d8a6ea21
MD5 9a00f0c924392619838fe677eef2b6e5
BLAKE2b-256 18d605a638c9136a1e986a693c05b9d7c184cb755d272fdf7d65105694ee14e3

See more details on using hashes here.

Provenance

The following attestation bundles were made for ai_agents_metrics-0.1.0.tar.gz:

Publisher: publish.yml on sg4tech/ai-agents-metrics

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ai_agents_metrics-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for ai_agents_metrics-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 778631aa0d53317218f556bf990846f0c9f616c69f1fabef2ec1efe3a8910d78
MD5 fae7431fdc88fdf32ef1558bbaa8048d
BLAKE2b-256 9254bab10dc83fd4f1056793da653223f9f22b66c9775975e638bd01fa8784c7

See more details on using hashes here.

Provenance

The following attestation bundles were made for ai_agents_metrics-0.1.0-py3-none-any.whl:

Publisher: publish.yml on sg4tech/ai-agents-metrics

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page