Know what an AI task will cost before you run it
Project description
agent-estimate
Know before you build.
PERT estimates for AI-agent tasks — how long, which model's reliable enough, and the human-equivalent cost. In one command.
Why
AI agents can write the code — but how long will the task actually take? Manual estimation is slow and biased toward optimism; no estimate means scope creep and missed deadlines. The gap between "agents can do it" and "we know when it'll be done" is where projects break down.
agent-estimate closes that gap in one command: a three-point PERT timeline calibrated on real agent runs, plus a human-speed comparison so you see the compression before you spend the compute. It sizes the task, picks a tier, routes it to a model, and flags when the work runs past that model's reliability horizon — calibrated forecasts in seconds, not meetings.
Multi-model matters because the models aren't interchangeable. Opus 4.7, GPT-5.5, and Gemini 3.1 have different reliability horizons (METR p80) and different costs per turn. A safe 40-minute job for one model is a coin flip for another. agent-estimate models the whole fleet, not a single agent — so the number reflects who actually runs the work.
Quick Start
First estimate: 30 seconds to install. Every one after: instant.
With your agent (recommended)
Paste this into your Claude Code or Codex session:
Install the agent-estimate plugin (https://github.com/kiloloop/agent-estimate) and
estimate this task for me: "Implement OAuth 2.0 flow (Google + GitHub)". Tell me the
expected time, the human-speed equivalent, and the compression ratio.
Your agent installs the tool, runs the estimate, and reads back the numbers. Nothing to memorize — describe the task in plain English and let the agent translate to flags.
For a whole backlog:
Estimate every open issue in this repo with agent-estimate, group them into parallel
waves, and tell me the total wall-clock time for a 3-agent fleet versus doing them
sequentially myself.
Manual
pip install agent-estimate
agent-estimate estimate "your task description here"
No config required — sensible defaults for a 3-agent fleet (Claude, Codex, Gemini). Point it at a file or GitHub issues when you're ready:
agent-estimate estimate --file tasks.txt
agent-estimate estimate --repo myorg/myrepo --issues 11,12,14
agent-estimate session --agents 3 --rounds 2 --type review
How It Works
agent-estimate produces three-point PERT estimates calibrated for agents, not humans:
- Tier classification — auto-sizes tasks XS→XL from complexity signals
- PERT math — optimistic / most-likely / pessimistic, weighted to an expected value
- Human comparison — a per-task-type multiplier, so you see the compression
- METR thresholds — warns when an estimate exceeds a model's p80 reliability horizon
- Wave planning — schedules independent tasks in parallel across the fleet
- Review overhead — models review cycles as additive cost (
standard,complex,3-round) - Modifiers —
--spec-clarity,--warm-context,--agent-fittune the estimate
Task types
| Type | Flag | Models |
|---|---|---|
| Coding | (default) | Feature work, fixes, refactors |
| Research | --type research |
Audits, investigations, analysis |
| Documentation | --type documentation |
API docs, guides, changelogs |
| Brainstorm | --type brainstorm |
Ideation, spikes, design exploration |
| Config/SRE | --type config |
Deploys, infra, CI/CD |
| Frontend/UI | --type frontend |
Content patches vs. component builds |
| App dev | --type app_dev |
App shells, desktop/mobile builds |
METR thresholds (defaults)
| Model | p80 threshold |
|---|---|
| Opus 4.7 | 90 min |
| GPT-5.5 | 90 min |
| GPT-5.4 | 60 min |
| Gemini 3.1 Pro | 45 min |
| Sonnet 4.6 | 30 min |
| Haiku 4.5 | 15 min |
opus_4_x is a forward-compatible alias that resolves to the current Opus threshold. Legacy keys (opus_4_6, GPT-5/5.2/5.3, Gemini 3 Pro, Sonnet) stay supported. Estimates are calibrated against Claude Code (Opus 4.7, high thinking) and Codex (GPT-5.4/5.5, extra-high) — shift with --spec-clarity and --warm-context for other setups.
Examples
Real estimates from production use — including the misses.
The tool, estimating its own docs. We sized this v0.7.0 skill-and-README refresh at ~30 minutes. It took 28.
An honest over-estimate. We pre-registered a UI mockup build at ~95 minutes with no prior app-dev data. Two agents did it in parallel in 12 and 25 minutes — a 4–8x over-estimate. agent-estimate now ships an app_dev prior shaped by that result. The miss stays in the README because calibration means showing where you were wrong.
Three tasks, three agents, in parallel — what the tool prints, including the METR reliability flags. Input is the three-task tasks.txt from examples/multi-agent.md; the output below is captured from a real run, trimmed to the timeline and warnings (the full report — per-task PERT table, wave plan, agent loads — is in that example):
$ agent-estimate estimate --file tasks.txt
## Timeline Summary
| Metric | Value |
| --- | --- |
| Best case | 44.7m |
| Expected case | 75.4m |
| Worst case | 117.2m |
| Human-speed equivalent | 608.2m |
| Compression ratio | 8.07x |
| Review overhead (per-task, pre-amortization) | 45m |
## METR Warnings
- **Add known_debt.md as standard protocol memory file**: Estimate (75m) exceeds gpt_5_4 p80 threshold (60m). Consider splitting the task.
- **Write quickstart guide with protocol comparison table**: Estimate (75m) exceeds gemini_3_1_pro p80 threshold (45m). Consider splitting the task.
~75 minutes wall-clock versus ~10.1 hours of sequential human work, at an estimated $3.51 fleet cost — plus two flags that the Codex- and Gemini-assigned tasks run past their models' p80 reliability horizons, so you split them or add a checkpoint before dispatching. The same three tasks were later run by real agents; the retro is in the example file. More in examples/ — coding S/M, research, documentation, multi-agent.
Integrations
Claude Code plugin
/plugin marketplace add kiloloop/agent-estimate
/plugin install agent-estimate@agent-estimate-marketplace
/estimate Add a login page with OAuth
/estimate --file spec.md
/estimate --issues 1,2,3 --repo myorg/myrepo
/estimate validate observation.yaml
/estimate calibrate
GitHub Action
- uses: kiloloop/agent-estimate@v0
with:
issues: '11,12,14'
Full workflow example
name: Estimate
on:
pull_request:
types: [opened, synchronize]
permissions:
contents: read
issues: read
pull-requests: write
jobs:
estimate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: kiloloop/agent-estimate@v0
with:
issues: '11,12,14'
output-mode: summary+pr-comment
Action inputs and outputs
| Input | Required | Default | Description |
|---|---|---|---|
issues |
yes | — | GitHub issue numbers (comma-separated) |
repo |
no | current repo | GitHub repo (owner/name) |
format |
no | markdown |
Output format: markdown or json |
output-mode |
no | summary |
summary, pr-comment, step-output, summary+pr-comment |
config |
no | — | Path to agent config YAML |
title |
no | Agent Estimate Report |
Report title |
review-mode |
no | standard |
Review tier: none, standard, complex, 3-round |
spec-clarity |
no | 1.0 |
Spec clarity modifier (0.3–1.3) |
warm-context |
no | 1.0 |
Warm context modifier (0.3–1.15) |
agent-fit |
no | 1.0 |
Agent fit modifier (0.9–1.2) |
task-type |
no | — | Category: coding, brainstorm, research, config, documentation, frontend, app_dev |
python-version |
no | 3.12 |
Python version to use |
version |
no | latest | agent-estimate version to install |
token |
no | ${{ github.token }} |
GitHub token |
| Output | Description |
|---|---|
report |
Full estimation report content |
expected-minutes |
Expected minutes (when format: json) |
Skill layout
Skills follow the oacp-skills convention:
skills/estimate/
skill.yaml # machine-readable metadata
README.md # human-readable docs
shared/INTENT.md # shared intent across runtimes
claude/SKILL.md # Claude Code skill definition
codex/SKILL.md # Codex skill definition
Both runtime slices cover the same CLI (estimate, validate, calibrate), phrased for their respective ecosystems.
Configuration
Agent fleet
Pass a config to model your own fleet:
agents:
- name: Claude
capabilities: [planning, implementation, review]
parallelism: 2
cost_per_turn: 0.12
model_tier: frontier
- name: Codex
capabilities: [implementation, debugging, testing]
parallelism: 3
cost_per_turn: 0.08
model_tier: production
settings:
friction_multiplier: 1.15
inter_wave_overhead: 0.25
review_overhead: 0.2
metr_fallback_threshold: 45.0
agent-estimate estimate "Ship packaging flow" --config ./my_agents.yaml
Output formats
agent-estimate estimate "Refactor auth pipeline" --format json # machine-readable
agent-estimate estimate --repo myorg/myrepo --issues 11,12,14 # from GitHub issues
agent-estimate estimate --file tasks.txt # from file
agent-estimate estimate "Follow-up fix" --history-file data.json # auto warm-context
When --warm-context is omitted, the CLI can auto-infer it from --history-file;
if no history file is passed and ./data.json exists, that file is used as the
default dispatch history source.
Session estimates
Use agent-estimate session for coordinated workflows where multiple agents run
rounds of brainstorm, review, research, documentation, config, or coding work:
agent-estimate session --agents 3 --rounds 2 --type review
agent-estimate session --agents 4 --rounds 1 --per-round-minutes 25 --format json
The command reports wall-clock time, total agent-minutes, coordination overhead, and per-round breakdowns.
Calibration
Validate estimates against observed outcomes and build a calibration database:
agent-estimate validate observation.yaml --db ~/.agent-estimate/calibration.db
Project
- Website — landing page, live demo, and the estimate comparison view.
- OACP — coordinate the agents you just estimated. Open Agent Coordination Protocol for multi-agent async workflows.
- oacp-skills — the skill bundle agent-estimate's
/estimateships in. - kiloloop — the rest of the ecosystem.
Contributing
See CONTRIBUTING.md for the full workflow.
pip install -e '.[dev]'
ruff check .
pytest -q
Community
License
Apache License 2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agent_estimate-0.7.2.tar.gz.
File metadata
- Download URL: agent_estimate-0.7.2.tar.gz
- Upload date:
- Size: 118.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
30b0ae5fcc120f6961a6531f2470dc9eb6fd6598ebc91ddc225a06a08ece193a
|
|
| MD5 |
037154cc2c0df4a6f8c2896273d24bc8
|
|
| BLAKE2b-256 |
f23933d7621dd759c5634e472656f58136aea428340c3a0140e194b6124c3d4c
|
Provenance
The following attestation bundles were made for agent_estimate-0.7.2.tar.gz:
Publisher:
publish.yml on kiloloop/agent-estimate
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
agent_estimate-0.7.2.tar.gz -
Subject digest:
30b0ae5fcc120f6961a6531f2470dc9eb6fd6598ebc91ddc225a06a08ece193a - Sigstore transparency entry: 1822397326
- Sigstore integration time:
-
Permalink:
kiloloop/agent-estimate@022b5298f360ff9c2634900cbaae769e23394de1 -
Branch / Tag:
refs/tags/v0.7.2 - Owner: https://github.com/kiloloop
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@022b5298f360ff9c2634900cbaae769e23394de1 -
Trigger Event:
release
-
Statement type:
File details
Details for the file agent_estimate-0.7.2-py3-none-any.whl.
File metadata
- Download URL: agent_estimate-0.7.2-py3-none-any.whl
- Upload date:
- Size: 70.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eaacedd5eca43bada1fb71c295ebe43334e56791e1d9dd0ca2127c403b984859
|
|
| MD5 |
3d94874babaa921024bc29f0ebc27a92
|
|
| BLAKE2b-256 |
990aa26c818674e7057f53998c08e546def98ecb6d3fd032b3d7289ac7b3fadc
|
Provenance
The following attestation bundles were made for agent_estimate-0.7.2-py3-none-any.whl:
Publisher:
publish.yml on kiloloop/agent-estimate
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
agent_estimate-0.7.2-py3-none-any.whl -
Subject digest:
eaacedd5eca43bada1fb71c295ebe43334e56791e1d9dd0ca2127c403b984859 - Sigstore transparency entry: 1822397338
- Sigstore integration time:
-
Permalink:
kiloloop/agent-estimate@022b5298f360ff9c2634900cbaae769e23394de1 -
Branch / Tag:
refs/tags/v0.7.2 - Owner: https://github.com/kiloloop
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@022b5298f360ff9c2634900cbaae769e23394de1 -
Trigger Event:
release
-
Statement type: