How Am I Doing — local-only self-audit & coaching for Claude Code sessions
Project description
How Am I Doing (HAID)
Score, visualize, and coach your Claude Code sessions — from the transcripts already on your machine.
HAID turns your own Claude Code session transcripts into one number you can track and
rank, a visualization of where the work and the tokens actually went, and a
coaching report on what to do differently next time. It runs locally, against the
~/.claude/projects/ transcripts you already have — no instrumentation, no account, nothing
leaves your machine unless you explicitly opt in.
pip install haid
Then, inside Claude Code, just ask: "how am I doing?"
🏆 The leaderboard
Every analyzed window of work gets a single value score:
value = Σ achievement ÷ Σ normalized tokens
— how much you got done, per token spent. That one number is what you track run over run, and it's what the opt-in community leaderboard ranks. The board is:
- Default-off and local-first. Viewing uploads nothing:
haid rankdownloads the public distribution and computes your percentile client-side. - Summary-only when you do submit.
haid submitposts a signed score summary — per-axis positions, the value figure, token totals, ladder/config version hashes. Never your transcripts, diffs, or prompts (a leak check refuses anything path- or title-shaped). - Zero-backend and tamper-evident. Submitting opens a GitHub PR against a separate
data-only repo (
dv-hart/haid-benchmark); identity comes free from the PR author, and the merged, append-only log is rendered by GitHub Pages → the public board.
It's deliberately trust-but-verify, low-stakes — a self-reported board, labeled as such, with a plausibility check as the only gate. See ADR-0005.
The big idea: an achievement ladder, not a token counter
A token count tells you what you spent. It says nothing about what you got. HAID's core idea is to measure achievement relatively, by placing your work against a fixed, calibrated reference ladder of real code changes — then dividing by what it cost.
achievement = volume^0.5 · Difficulty · Cleanliness
value = achievement ÷ normalized-token cost
- Volume — deterministic surviving lines in the final diff, weighted by file kind (hand-written logic > config > generated). Sub-linear, so a small excellent change beats a big mediocre one.
- Difficulty — not lines of code. An LLM judge places your diff on a reference ladder by asking "what fraction of working engineers could produce THIS correctly?" — explicitly ignoring size and surface sophistication. Calibration-validated (Spearman ρ ≈ 0.87 vs. an expensive full pairwise sort). See difficulty-ladder.md.
- Cleanliness — a parsimony placement against its own ladder ("achieves the purpose with less unnecessary complexity?"), a steep penalty that stops LOC-padding from buying score. See cleanliness-ladder.md.
- Cost — tokens weighted by type and model tier into a single normalized-token unit (relative effort, not a dollar bill), always reported with the full per-type/per-tier breakdown.
Achievement is scored on the final artifact, judged as if a human handed it to you cold — it has nothing to do with how many tokens it took. That decoupling is the whole design: a flawless-looking session that burned a fortune on a tiny, unoriginal change is a bad ratio, and HAID will say so. The axes are reported separately and never collapsed, so the score is always auditable. Method and calibration: scoring-rubric.md.
The visualization
The score's companion is a self-contained HTML visualization of the window — a time-layered bus diagram with the agent spine down the middle, files in the gutters (left = reads in, right = writes out), color by file, width by token volume, and per-episode achievement badges from the same scoring run. Opens in any browser, no server:
haid viz --project ~/path/to/project --days 30 --scores out/report/scores.json
Same substrate, two views: where the work went and where the tokens went. Design in visualization.md.
The coaching report
The score tells you where you stand; the report tells you what to change. HAID runs two orthogonal passes over the session graph:
- User-anchored pass — catches misalignment. Works backward from your messages; corrections are ground truth ("no, I meant…", "that's wrong").
- Signature-scanning pass — catches silent inefficiency: redundant re-reads, retry loops, re-touched lines, unused context — objective, reasoning-free waste signatures.
Flagged hotspots are investigated by per-anchor agents that cite their evidence and hedge their remedies, then matched against a vetted treatment catalog. The guiding constraint everywhere: a tool that confidently misdiagnoses is worse than nothing, so the bar is cite-or-say-unknown (trust-discipline.md). The two passes are complementary — one finds where the agent did the wrong thing, the other where it did the right thing wastefully (architecture.md).
Install
HAID is on PyPI — stdlib-only, no dependencies, Python ≥ 3.10:
pip install haid
haid --help
# the quickest, fully-deterministic smoke test (no model calls):
haid metrics --project ~/path/to/some/project --days 30
On Ubuntu/WSL without a venv, a user install works fine (ensure ~/.local/bin is on PATH):
python3 -m pip install --user haid
Where to install: HAID reads transcripts from ~/.claude/projects/ on the machine where
the sessions ran. If you use Claude Code inside WSL, install HAID inside WSL too. (A
Windows-side install can still reach WSL transcripts via UNC --session paths like
//wsl.localhost/Ubuntu/home/<user>/.claude/projects/<slug>/*.jsonl, but --project
discovery won't cross the boundary.)
Activating in Claude Code
The haid CLI never calls a model itself. The full pipeline
(metrics → tag → episodes → score → why → report) is driven by Claude Code through the
haid-report skill: the CLI writes a job manifest at
each model boundary, and the skill tells Claude how to fulfill it with subagents and resume.
-
Install the CLI (above) so
haidis on the PATH where Claude Code runs. -
Copy the skill into Claude Code's skills directory:
# available in every project: mkdir -p ~/.claude/skills/haid-report cp .claude/skills/haid-report/SKILL.md ~/.claude/skills/haid-report/ # …or scoped to a single project: mkdir -p <project>/.claude/skills/haid-report cp .claude/skills/haid-report/SKILL.md <project>/.claude/skills/haid-report/
(Working inside this repo, the skill is already active — it's a project skill here.)
-
Start a new session and ask "how am I doing?", or invoke
/haid-report. Claude runs the chain and presents the report. For a zero-cost, fully deterministic answer, ask for the--digest-onlyreport or just the waste metrics.
How it works under the hood: one session graph
Everything above is computed from a single data structure: a graph of the session(s). Turns and tool-calls are nodes; edges capture responds-to, reads, and produces relationships. Build the graph once and every feature falls out as an operation on it — "why did you do X?" is a backward traversal to X's trigger; "where did the tokens go?" is a weighting over the same nodes; achievement is a placement of the net diff it produces. The pipeline is stdlib-only and deterministic up to the model-judgment boundaries:
- Session parsing (
src/haid/session/) — forest-aware JSONL parsing: dedup, branch/rewind classification, subagent stitching, overflow resolution, SQLite parse-cache. - Session graph (
src/haid/graph/) — L0 spine + L1 action/IO graph (reads/produces/edits fromstructuredPatch), signatures, per-timeline scoping. - Waste metrics (
src/haid/metrics/) —rereads,retries,retouched,unused_context; one rule each, as benchmarkable token-rates vs. a per-scope baseline. - Bridge (
src/haid/bridge/) — reconstructs a window's net code diff from the transcript alone (replay, no git) plus its normalized-token cost. - Scoring (
src/haid/scoring/) — the relative achievement/cost value scorer (difficulty + cleanliness placement, volume, cost, value combiner), calibration-validated. - Intent / Episodes / Why / Report (
src/haid/{intent,episodes,why,report}/) — message tagging, the git-free PR-proxy grouping, the cited investigation pass, and the compositor.
Design: session-graph-design.md. The whole chain is validated
on real transcripts (python -m pytest); see plans/roadmap.md.
What this is not
Not another token counter. Raw usage accounting is already well covered by ccusage and similar. HAID's entire value lives one layer up — in scoring, diagnosis, and coaching: telling you not what you spent, but how much you achieved per token and how to get better.
Documentation map
| Doc | What's in it |
|---|---|
| docs/vision.md | The full concept, goals, and the canonical test case |
| docs/architecture.md | The two-pass method and how the pieces fit |
| docs/scoring-rubric.md | Achievement vs. cost — the relative value verdict |
| docs/difficulty-ladder.md | The validated difficulty scorer (reference ladder + placement) |
| docs/cleanliness-ladder.md | The cleanliness/parsimony scorer (reference ladder + placement) |
| docs/axis-calibration-playbook.md | Self-contained recipe to calibrate a new scoring axis |
| docs/treatments.md | The remedy catalog matched mechanically in haid report |
| docs/visualization.md | The time-layered bus diagram (left-in/right-out, bundled) |
| docs/session-graph-design.md | Node/edge taxonomy, episodes, the two core operations |
| docs/detectors.md | Detector catalog + waste metrics as graph queries |
| docs/intent-taxonomy.md | Two-axis message classification + purpose timeline + drift |
| docs/metrics-output-schema.md | The haid metrics --json contract |
| docs/claude-code-data-format.md | Verified Claude Code on-disk data reference |
| docs/trust-discipline.md | Cite-or-unknown, hedging, no-traceable-origin |
| docs/decisions/ | Architecture Decision Records (ADRs) |
| plans/roadmap.md | Phased delivery plan |
| plans/community-benchmark.md | The opt-in self-reported leaderboard design (ADR-0005) |
Repository layout
HAID/
├── README.md # you are here
├── docs/ # design & reference documentation (decisions/ = ADRs)
├── plans/ # roadmap + active design notes (shipped build-logs in plans/archive/)
├── src/haid/ # implementation
│ ├── session/ # parse: forest model, subagents, overflow, cache
│ ├── graph/ # L0 spine + L1 IO graph (incl. Bash read/write parsing)
│ ├── metrics/ # the four waste metrics + baseline + `haid metrics`
│ ├── window.py # the multi-session analysis window
│ ├── bridge/ # transcript→(diff, usage) reconstruction
│ ├── scoring/ # relative value scorer (difficulty/cleanliness/volume/cost/value)
│ ├── intent/ # move × work-type message tagging (`haid tag`)
│ ├── episodes/ # session→episode grouping + per-episode scoring
│ ├── why/ # per-anchor investigation agents (`haid why`)
│ ├── report/ # digest + composed report + benchmark payload (`haid report`)
│ └── viz/ # self-contained HTML render (`haid viz`)
├── tests/ # session/ graph/ metrics/ scoring/ bridge/ intent/ episodes/ why/ report/
└── scripts/ # baseline/benchmark-pin regeneration
The one-time scoring-axis calibration harness and the raw research probes that seeded the docs live on the
archive/experimentsbranch — their validated output already ships insrc/haid/data/, so they're kept for provenance rather than onmain.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file haid-0.0.10.tar.gz.
File metadata
- Download URL: haid-0.0.10.tar.gz
- Upload date:
- Size: 221.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ede608e9135d9867ba9cf5c87e452f985f970ec4fd9671a23c2f8356f627b53a
|
|
| MD5 |
f4ede143b8fa5463ae5ee8a075555d9e
|
|
| BLAKE2b-256 |
9f3dbae08a3e585c4f38c6822e85e3dbc69873a42acf0c9b60737ecb39e05b04
|
Provenance
The following attestation bundles were made for haid-0.0.10.tar.gz:
Publisher:
publish.yml on dv-hart/haid
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
haid-0.0.10.tar.gz -
Subject digest:
ede608e9135d9867ba9cf5c87e452f985f970ec4fd9671a23c2f8356f627b53a - Sigstore transparency entry: 1976284934
- Sigstore integration time:
-
Permalink:
dv-hart/haid@53716e0dee594b9bad42e111beea0aa17b54c638 -
Branch / Tag:
refs/tags/v0.0.10 - Owner: https://github.com/dv-hart
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@53716e0dee594b9bad42e111beea0aa17b54c638 -
Trigger Event:
push
-
Statement type:
File details
Details for the file haid-0.0.10-py3-none-any.whl.
File metadata
- Download URL: haid-0.0.10-py3-none-any.whl
- Upload date:
- Size: 251.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
31413e4d7223f17bb4f7ccd02f4e3652866fd31a961963e00a17607477af1cbc
|
|
| MD5 |
1e4411eac89397221cb09379a615e599
|
|
| BLAKE2b-256 |
c7895b1d52bb35ba5433a9676c2ab2ab75a7214ae7acc090c8c3888dc29a39bf
|
Provenance
The following attestation bundles were made for haid-0.0.10-py3-none-any.whl:
Publisher:
publish.yml on dv-hart/haid
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
haid-0.0.10-py3-none-any.whl -
Subject digest:
31413e4d7223f17bb4f7ccd02f4e3652866fd31a961963e00a17607477af1cbc - Sigstore transparency entry: 1976285074
- Sigstore integration time:
-
Permalink:
dv-hart/haid@53716e0dee594b9bad42e111beea0aa17b54c638 -
Branch / Tag:
refs/tags/v0.0.10 - Owner: https://github.com/dv-hart
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@53716e0dee594b9bad42e111beea0aa17b54c638 -
Trigger Event:
push
-
Statement type: