Pre-execution cost estimation for LLM agent workflows with calibration learning
Project description
tokencast
A Claude Code skill that estimates Anthropic API cost for planned agent tasks, then learns from actual usage to improve estimates over time.
Install once per project. It auto-estimates after plans are created and auto-learns at session end. Zero ongoing friction.
Setup (one time per project)
# Clone the repo (anywhere — it doesn't need to live inside your project)
git clone https://github.com/krulewis/tokencast.git
# Install into your project (quote paths with spaces)
bash tokencast/scripts/install-hooks.sh "/path/to/your-project"
Paths with spaces: Always wrap the project path in quotes. Without them the install script will fail on paths like
/Volumes/Macintosh HD2/....
This does three things:
- Symlinks the skill into
<project>/.claude/skills/tokencast/ - Adds a
Stophook for auto-learning at session end - Adds a
PostToolUsehook to nudge estimation after planning agents
Every Claude Code session in that project now has tokencast active.
What Happens Automatically
After a plan is created
tokencast detects the plan in conversation context, infers size, files, complexity, project type, and language, then outputs a cost table:
## tokencast estimate
Change: size=M, files=5, complexity=medium
Calibration: 1.12x from 8 prior runs
| Step | Model | Optimistic | Expected | Pessimistic |
|-----------------------|--------|------------|----------|-------------|
| Research Agent | Sonnet | $0.60 | $1.17 | $4.47 |
| Architect Agent | Opus | $0.67 | $1.18 | $3.97 |
| ... | ... | ... | ... | ... |
| TOTAL | | $3.37 | $6.26 | $22.64 |
At session end
The learning hook silently:
- Reads the session's JSONL log
- Computes actual token cost (including cache write tokens)
- Compares to the estimate
- Updates calibration factors
Next session
Future estimates use learned correction factors. More sessions = better accuracy.
Manual Invocation
You can also invoke explicitly with overrides:
/tokencast size=L files=12 complexity=high
/tokencast steps=implement,test,qa
/tokencast review_cycles=3
/tokencast review_cycles=0
Use review_cycles=N to set the number of expected PR review cycles. Use review_cycles=0 to suppress the PR Review Loop row.
How It Works
- Infers size, file count, complexity from the plan in conversation
- Reads reference files for pricing and token heuristics
- Loads learned calibration factors (if any exist)
- Computes per-step token estimates using activity decomposition
- Applies complexity multiplier, context accumulation
(K+1)/2, and cache rates - Splits into Optimistic / Expected / Pessimistic bands
- If PR Review Loop is in scope, computes loop cost using geometric decay across N review cycles (Optimistic=1, Expected=N, Pessimistic=N×2)
- Applies calibration correction to Expected band (individual steps re-anchor; PR Review Loop scales each band independently)
- Records the estimate for later comparison with actuals
Overrides
| Override | Effect |
|---|---|
size=M |
Set size class explicitly |
files=5 |
Set file count explicitly |
complexity=high |
Set complexity explicitly |
steps=implement,test,qa |
Estimate only those pipeline steps |
project_type=migration |
Set project type explicitly |
language=go |
Set primary language explicitly |
review_cycles=3 |
Set PR review cycle count (0 = disable) |
Confidence Bands
| Band | Cache Hit | Multiplier | Meaning |
|---|---|---|---|
| Optimistic | 60% | 0.6x | Best case — focused agent work |
| Expected | 50% | 1.0x | Typical run |
| Pessimistic | 30% | 3.0x | With rework loops, debugging, retries |
Calibration
Calibration is fully automatic:
- 0-2 sessions: No correction applied. "Collecting data" status.
- 3-10 sessions: Global correction factor via trimmed mean of actual/expected ratios (trim_fraction=0.1).
- 10+ sessions: EWMA with recency weighting. Per-size-class factors activate when a class has 3+ samples.
- Outlier filtering: Sessions with actual/expected ratio >3.0x or <0.2x are excluded from calibration and logged for inspection.
Calibration data lives in calibration/ (gitignored, local to each user).
Disabling
bash /path/to/tokencast/scripts/disable.sh /path/to/your-project
Removes the skill and hooks. Preserves calibration data for reuse.
Files
SKILL.md — Skill definition (auto-trigger, algorithm)
references/pricing.md — Model prices, cache rates, step→model map
references/heuristics.md — Token budgets, pipeline decompositions, multipliers
references/examples.md — Worked examples with arithmetic
references/calibration-algorithm.md — Detailed calibration algorithm reference
commands/
tokencast-version.md — /tokencast-version slash command
scripts/
install-hooks.sh — One-time project setup
disable.sh — Remove from project
tokencast-learn.sh — Stop hook: auto-captures actuals
tokencast-track.sh — PostToolUse hook: nudges estimation after plans
sum-session-tokens.py — Parses session JSONL for actual costs
update-factors.py — Computes calibration factors from history
calibration/ — Per-user local data (gitignored)
history.jsonl — Estimate vs actual records
factors.json — Learned correction factors
active-estimate.json — Transient marker for current estimate
v1.1 Changes
- Trimmed mean replaces median for faster convergence with small samples
- Outlier flagging — extreme ratios (>3.0x or <0.2x) excluded from calibration, logged for inspection
- Richer data — project type, language, pipeline signature, and step count captured per session
- Baseline subtraction — tokens spent before the estimate are excluded from actuals
- Security hardening — path injection fixes, consolidated parsing, safe handling of paths with spaces
- Version markers —
version: 1.1.0in SKILL.md,--versionflag on learn script
v1.2 Changes
- PR Review Loop modeling — geometric-decay cost model for review-fix-re-review cycles
- New override —
review_cycles=Nto set expected cycle count (0 = disable) - Per-band calibration — PR Review Loop applies calibration independently per band (not re-anchored)
- New schema fields —
review_cycles_estimatedandreview_cycles_actualin active-estimate.json
Limitations
- Pipeline step names reflect a default workflow — map your own steps to the closest defaults. Formulas are pipeline-agnostic (see
references/heuristics.md) - Heuristics assume typical 150-300 line source files
- Does not model parallel agent execution
- Calibration requires 3+ completed sessions before corrections activate
- Pricing data embedded; check
last_updatedin references/pricing.md - Multi-session tasks only capture the session containing the estimate
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tokencast-0.1.0.tar.gz.
File metadata
- Download URL: tokencast-0.1.0.tar.gz
- Upload date:
- Size: 441.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8d48e0bdb5f6e63b9057595d66aaaf496bb0b05688799e829c329f282e5dd4c0
|
|
| MD5 |
653970c8f9317c3ff3d818a62dbc25eb
|
|
| BLAKE2b-256 |
b8a016cdde26d041fe9d6947ce7b64fd4ab3a5c374187e2d009218bbddc67e1a
|
File details
Details for the file tokencast-0.1.0-py3-none-any.whl.
File metadata
- Download URL: tokencast-0.1.0-py3-none-any.whl
- Upload date:
- Size: 5.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eb723e809ca58fe63adbe0946d31ee0a8c450b4bea9250b181d85679321d3bb7
|
|
| MD5 |
fa2aada3c2cdb250d12b34e8b11b5a02
|
|
| BLAKE2b-256 |
10e6fbe963426875b5ff909fb1270fe730bc4853a673ecf541fc02a53e1a4775
|