Skip to main content

How Am I Doing — local-only self-audit & coaching for Claude Code sessions

Project description

How Am I Doing (HAID)

Score, visualize, and coach your Claude Code sessions — from the transcripts already on your machine.

HAID turns your own Claude Code session transcripts into one number you can track and rank, a visualization of where the work and the tokens actually went, and a coaching report on what to do differently next time. It runs locally, against the ~/.claude/projects/ transcripts you already have — no instrumentation, no account, nothing leaves your machine unless you explicitly opt in.

pip install haid

Then, inside Claude Code, just ask: "how am I doing?"


🏆 The leaderboard

Every analyzed window of work gets a single value score:

value  =  Σ achievement  ÷  Σ normalized tokens

how much you got done, per token spent. That one number is what you track run over run, and it's what the opt-in community leaderboard ranks. The board is:

  • Default-off and local-first. Viewing uploads nothing: haid rank downloads the public distribution and computes your percentile client-side.
  • Summary-only when you do submit. haid submit posts a signed score summary — per-axis positions, the value figure, token totals, ladder/config version hashes. Never your transcripts, diffs, or prompts (a leak check refuses anything path- or title-shaped).
  • Zero-backend and tamper-evident. Submitting opens a GitHub PR against a separate data-only repo (dv-hart/haid-benchmark); identity comes free from the PR author, and the merged, append-only log is rendered by GitHub Pages → the public board.

It's deliberately trust-but-verify, low-stakes — a self-reported board, labeled as such, with a plausibility check as the only gate. See ADR-0005.

The big idea: an achievement ladder, not a token counter

A token count tells you what you spent. It says nothing about what you got. HAID's core idea is to measure achievement relatively, by placing your work against a fixed, calibrated reference ladder of real code changes — then dividing by what it cost.

achievement  =  volume^0.5  ·  Difficulty  ·  Cleanliness
value        =  achievement  ÷  normalized-token cost
  • Volume — deterministic surviving lines in the final diff, weighted by file kind (hand-written logic > config > generated). Sub-linear, so a small excellent change beats a big mediocre one.
  • Difficultynot lines of code. An LLM judge places your diff on a reference ladder by asking "what fraction of working engineers could produce THIS correctly?" — explicitly ignoring size and surface sophistication. Calibration-validated (Spearman ρ ≈ 0.87 vs. an expensive full pairwise sort). See difficulty-ladder.md.
  • Cleanliness — a parsimony placement against its own ladder ("achieves the purpose with less unnecessary complexity?"), a steep penalty that stops LOC-padding from buying score. See cleanliness-ladder.md.
  • Cost — tokens weighted by type and model tier into a single normalized-token unit (relative effort, not a dollar bill), always reported with the full per-type/per-tier breakdown.

Achievement is scored on the final artifact, judged as if a human handed it to you cold — it has nothing to do with how many tokens it took. That decoupling is the whole design: a flawless-looking session that burned a fortune on a tiny, unoriginal change is a bad ratio, and HAID will say so. The axes are reported separately and never collapsed, so the score is always auditable. Method and calibration: scoring-rubric.md.

The visualization

The score's companion is a self-contained HTML visualization of the window — a time-layered bus diagram with the agent spine down the middle, files in the gutters (left = reads in, right = writes out), color by file, width by token volume, and per-episode achievement badges from the same scoring run. Opens in any browser, no server:

haid viz --project ~/path/to/project --days 30 --scores out/report/scores.json

Same substrate, two views: where the work went and where the tokens went. Design in visualization.md.

The coaching report

The score tells you where you stand; the report tells you what to change. HAID runs two orthogonal passes over the session graph:

  1. User-anchored pass — catches misalignment. Works backward from your messages; corrections are ground truth ("no, I meant…", "that's wrong").
  2. Signature-scanning pass — catches silent inefficiency: redundant re-reads, retry loops, re-touched lines, unused context — objective, reasoning-free waste signatures.

Flagged hotspots are investigated by per-anchor agents that cite their evidence and hedge their remedies, then matched against a vetted treatment catalog. The guiding constraint everywhere: a tool that confidently misdiagnoses is worse than nothing, so the bar is cite-or-say-unknown (trust-discipline.md). The two passes are complementary — one finds where the agent did the wrong thing, the other where it did the right thing wastefully (architecture.md).

Install

HAID is on PyPI — stdlib-only, no dependencies, Python ≥ 3.10:

pip install haid
haid --help

# the quickest, fully-deterministic smoke test (no model calls):
haid metrics --project ~/path/to/some/project --days 30

On Ubuntu/WSL without a venv, a user install works fine (ensure ~/.local/bin is on PATH):

python3 -m pip install --user haid

Where to install: HAID reads transcripts from ~/.claude/projects/ on the machine where the sessions ran. If you use Claude Code inside WSL, install HAID inside WSL too. (A Windows-side install can still reach WSL transcripts via UNC --session paths like //wsl.localhost/Ubuntu/home/<user>/.claude/projects/<slug>/*.jsonl, but --project discovery won't cross the boundary.)

Activating in Claude Code

The haid CLI never calls a model itself. The full pipeline (metrics → tag → episodes → score → why → report) is driven by Claude Code through the haid-report skill: the CLI writes a job manifest at each model boundary, and the skill tells Claude how to fulfill it with subagents and resume.

  1. Install the CLI (above) so haid is on the PATH where Claude Code runs.

  2. Copy the skill into Claude Code's skills directory:

    # available in every project:
    mkdir -p ~/.claude/skills/haid-report
    cp .claude/skills/haid-report/SKILL.md ~/.claude/skills/haid-report/
    
    # …or scoped to a single project:
    mkdir -p <project>/.claude/skills/haid-report
    cp .claude/skills/haid-report/SKILL.md <project>/.claude/skills/haid-report/
    

    (Working inside this repo, the skill is already active — it's a project skill here.)

  3. Start a new session and ask "how am I doing?", or invoke /haid-report. Claude runs the chain and presents the report. For a zero-cost, fully deterministic answer, ask for the --digest-only report or just the waste metrics.

How it works under the hood: one session graph

Everything above is computed from a single data structure: a graph of the session(s). Turns and tool-calls are nodes; edges capture responds-to, reads, and produces relationships. Build the graph once and every feature falls out as an operation on it — "why did you do X?" is a backward traversal to X's trigger; "where did the tokens go?" is a weighting over the same nodes; achievement is a placement of the net diff it produces. The pipeline is stdlib-only and deterministic up to the model-judgment boundaries:

  • Session parsing (src/haid/session/) — forest-aware JSONL parsing: dedup, branch/rewind classification, subagent stitching, overflow resolution, SQLite parse-cache.
  • Session graph (src/haid/graph/) — L0 spine + L1 action/IO graph (reads/produces/edits from structuredPatch), signatures, per-timeline scoping.
  • Waste metrics (src/haid/metrics/) — rereads, retries, retouched, unused_context; one rule each, as benchmarkable token-rates vs. a per-scope baseline.
  • Bridge (src/haid/bridge/) — reconstructs a window's net code diff from the transcript alone (replay, no git) plus its normalized-token cost.
  • Scoring (src/haid/scoring/) — the relative achievement/cost value scorer (difficulty + cleanliness placement, volume, cost, value combiner), calibration-validated.
  • Intent / Episodes / Why / Report (src/haid/{intent,episodes,why,report}/) — message tagging, the git-free PR-proxy grouping, the cited investigation pass, and the compositor.

Design: session-graph-design.md. The whole chain is validated on real transcripts (python -m pytest); see plans/roadmap.md.

What this is not

Not another token counter. Raw usage accounting is already well covered by ccusage and similar. HAID's entire value lives one layer up — in scoring, diagnosis, and coaching: telling you not what you spent, but how much you achieved per token and how to get better.

Documentation map

Doc What's in it
docs/vision.md The full concept, goals, and the canonical test case
docs/architecture.md The two-pass method and how the pieces fit
docs/scoring-rubric.md Achievement vs. cost — the relative value verdict
docs/difficulty-ladder.md The validated difficulty scorer (reference ladder + placement)
docs/cleanliness-ladder.md The cleanliness/parsimony scorer (reference ladder + placement)
docs/axis-calibration-playbook.md Self-contained recipe to calibrate a new scoring axis
docs/treatments.md The remedy catalog matched mechanically in haid report
docs/visualization.md The time-layered bus diagram (left-in/right-out, bundled)
docs/session-graph-design.md Node/edge taxonomy, episodes, the two core operations
docs/detectors.md Detector catalog + waste metrics as graph queries
docs/intent-taxonomy.md Two-axis message classification + purpose timeline + drift
docs/metrics-output-schema.md The haid metrics --json contract
docs/claude-code-data-format.md Verified Claude Code on-disk data reference
docs/trust-discipline.md Cite-or-unknown, hedging, no-traceable-origin
docs/decisions/ Architecture Decision Records (ADRs)
plans/roadmap.md Phased delivery plan
plans/community-benchmark.md The opt-in self-reported leaderboard design (ADR-0005)

Repository layout

HAID/
├── README.md                 # you are here
├── docs/                     # design & reference documentation (decisions/ = ADRs)
├── plans/                    # roadmap + active design notes (shipped build-logs in plans/archive/)
├── src/haid/                 # implementation
│   ├── session/              #   parse: forest model, subagents, overflow, cache
│   ├── graph/                #   L0 spine + L1 IO graph (incl. Bash read/write parsing)
│   ├── metrics/              #   the four waste metrics + baseline + `haid metrics`
│   ├── window.py             #   the multi-session analysis window
│   ├── bridge/               #   transcript→(diff, usage) reconstruction
│   ├── scoring/              #   relative value scorer (difficulty/cleanliness/volume/cost/value)
│   ├── intent/               #   move × work-type message tagging (`haid tag`)
│   ├── episodes/             #   session→episode grouping + per-episode scoring
│   ├── why/                  #   per-anchor investigation agents (`haid why`)
│   ├── report/               #   digest + composed report + benchmark payload (`haid report`)
│   └── viz/                  #   self-contained HTML render (`haid viz`)
├── tests/                    # session/ graph/ metrics/ scoring/ bridge/ intent/ episodes/ why/ report/
└── scripts/                  # baseline/benchmark-pin regeneration

The one-time scoring-axis calibration harness and the raw research probes that seeded the docs live on the archive/experiments branch — their validated output already ships in src/haid/data/, so they're kept for provenance rather than on main.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

haid-0.0.13.tar.gz (210.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

haid-0.0.13-py3-none-any.whl (237.6 kB view details)

Uploaded Python 3

File details

Details for the file haid-0.0.13.tar.gz.

File metadata

  • Download URL: haid-0.0.13.tar.gz
  • Upload date:
  • Size: 210.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for haid-0.0.13.tar.gz
Algorithm Hash digest
SHA256 b53678d78a184287fee01892c767815b98499a09259da5cf4b979ca33e774d64
MD5 e91528217e7ec7322b95ac028b748f52
BLAKE2b-256 35297db5b7c542379b27a6c6753dd3fce8a5100d30fe442e8b7da1ef4f5bbbe9

See more details on using hashes here.

Provenance

The following attestation bundles were made for haid-0.0.13.tar.gz:

Publisher: publish.yml on dv-hart/haid

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file haid-0.0.13-py3-none-any.whl.

File metadata

  • Download URL: haid-0.0.13-py3-none-any.whl
  • Upload date:
  • Size: 237.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for haid-0.0.13-py3-none-any.whl
Algorithm Hash digest
SHA256 62ba2a2bc88838d5c69b918b3aaba577f49dac581d9e61c7eeadbf14f1d6069b
MD5 ea790fcd6dbd297dc37fb9878faa4cea
BLAKE2b-256 716bd564482f220cafd29d7caf2f543ad53e12da33a0712feefd8a9930f8cd96

See more details on using hashes here.

Provenance

The following attestation bundles were made for haid-0.0.13-py3-none-any.whl:

Publisher: publish.yml on dv-hart/haid

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page