Race coding agents against each other on real tasks

These details have not been verified by PyPI

Project links

Project description

coderace

Stop reading blog comparisons. Race coding agents against each other on real tasks in your repo with your code.

Every week there's a new "Claude Code vs Codex vs Cursor" post. They test on toy problems with cherry-picked examples. coderace gives you automated, reproducible, scored comparisons on the tasks you actually care about.

Define a task. Run it against Claude Code, Codex, Aider, Gemini CLI, and OpenCode. Get a scored comparison table.

Install

pip install coderace

Quick Start

# Create a task template
coderace init fix-auth-bug

# Edit the task file (describe the bug, set test command)
# Then race the agents:
coderace run fix-auth-bug.yaml

# Or race them in parallel (uses git worktrees):
coderace run fix-auth-bug.yaml --parallel

# View results from the last run
coderace results fix-auth-bug.yaml

Task Format

name: fix-auth-bug
description: |
  The login endpoint returns 500 when email contains a plus sign.
  Fix the email validation in auth/validators.py.
repo: .
test_command: pytest tests/test_auth.py -x
lint_command: ruff check .
timeout: 300
agents:
  - claude
  - codex
  - aider

What It Does

For each agent in the task:

Creates a fresh git branch (coderace/<agent>-<task>)
Invokes the agent CLI with the task description
Runs your test command
Runs your lint command (optional)
Computes a composite score

Scoring

Metric	Weight	Description
Tests pass	40%	Did the test command exit 0?
Exit clean	20%	Did the agent itself exit 0 without timeout?
Lint clean	15%	Did the lint command exit 0?
Wall time	15%	Faster is better (normalized across agents)
Lines changed	10%	Fewer is better (normalized across agents)

Output

Terminal table with Rich formatting:

┌──────┬────────┬───────┬───────┬──────┬──────┬──────────┬───────┐
│ Rank │ Agent  │ Score │ Tests │ Exit │ Lint │ Time (s) │ Lines │
├──────┼────────┼───────┼───────┼──────┼──────┼──────────┼───────┤
│  1   │ claude │  85.0 │ PASS  │ PASS │ PASS │     10.5 │    42 │
│  2   │ codex  │  70.0 │ PASS  │ PASS │ FAIL │     15.2 │    98 │
│  3   │ aider  │  55.0 │ FAIL  │ PASS │ PASS │      8.1 │    31 │
└──────┴────────┴───────┴───────┴──────┴──────┴──────────┴───────┘

Results also saved as JSON in .coderace/<task>-results.json and as a self-contained HTML report in .coderace/<task>-results.html.

Try It Now

The examples/ directory has ready-to-use task templates:

# Race agents on adding type hints to your project
coderace run examples/add-type-hints.yaml

# Race agents on fixing an edge case bug
coderace run examples/fix-edge-case.yaml

# Race agents on writing new tests
coderace run examples/write-tests.yaml

Edit the repo and description fields to point at your actual project and describe your real task.

Statistical Mode

Run each agent multiple times and get mean ± stddev:

coderace run task.yaml --runs 5

Useful for tasks with variable outcomes (LLM nondeterminism is real).

HTML Reports

Export results as a shareable single-file HTML report:

# Auto-generated on every run at .coderace/<task>-results.html
# Or export manually:
coderace results task.yaml --html report.html

The HTML report has sortable columns and a dark theme. Drop it in a blog post or Slack.

Custom Scoring

Override the default weights in your task YAML:

scoring:
  tests: 60   # tests passing (default 40)
  exit: 20    # clean exit (default 20)
  lint: 10    # lint clean (default 15)
  time: 5     # wall time (default 15)
  lines: 5    # lines changed (default 10)

Weights are normalized automatically (don't need to sum to 100).

Supported Agents

Agent	CLI	Notes
Claude Code	`claude`	Anthropic's coding agent
Codex	`codex`	OpenAI Codex CLI
Aider	`aider`	Git-integrated AI coding
Gemini CLI	`gemini`	Google's Gemini CLI
OpenCode	`opencode`	Open-source terminal agent

Each agent must be installed and authenticated separately.

Parallel Mode

Use --parallel (or -p) to run all agents simultaneously using git worktrees. Each agent gets its own isolated working directory, so they don't interfere with each other.

coderace run task.yaml --parallel

Sequential mode (default) runs agents one at a time on the same repo.

Why coderace?

Blog posts compare models. coderace compares agents on your work.

Run on your actual codebase, not HumanEval
Automated scoring: tests, lint, time, lines changed
Parallel mode with git worktrees (no interference between agents)
JSON output for CI integration and tracking over time
Works with any agent that has a CLI

The goal isn't "which model is best." It's "which agent solves my specific problem best."

Requirements

Python 3.10+
Git
At least one coding agent CLI installed

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

2.0.0

Mar 12, 2026

1.9.0

Mar 12, 2026

1.8.0

Mar 12, 2026

1.7.0

Mar 10, 2026

1.6.0

Mar 10, 2026

1.5.0

Mar 10, 2026

1.4.1

Mar 6, 2026

1.4.0

Mar 5, 2026

1.3.0

Mar 5, 2026

1.2.0

Mar 3, 2026

0.9.0

Feb 28, 2026

0.8.1

Feb 27, 2026

0.8.0

Feb 27, 2026

0.7.1

Feb 27, 2026

0.7.0

Feb 27, 2026

0.6.0

Feb 25, 2026

0.5.0

Feb 24, 2026

0.4.0

Feb 24, 2026

0.3.0

Feb 24, 2026

This version

0.2.0

Feb 23, 2026

0.1.0

Feb 22, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

coderace-0.2.0.tar.gz (42.4 kB view details)

Uploaded Feb 23, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

coderace-0.2.0-py3-none-any.whl (22.0 kB view details)

Uploaded Feb 23, 2026 Python 3

File details

Details for the file coderace-0.2.0.tar.gz.

File metadata

Download URL: coderace-0.2.0.tar.gz
Upload date: Feb 23, 2026
Size: 42.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for coderace-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`1cd59bf4df4fed4a70497f2f362c1cb6531dba08a522275670e57d4d4b7d6d71`
MD5	`018fb2b29370c00b0843504577af6108`
BLAKE2b-256	`e10c92123619a491bbce9859def581c18d64c99eb7c4f8df81005161b02fc102`

See more details on using hashes here.

File details

Details for the file coderace-0.2.0-py3-none-any.whl.

File metadata

Download URL: coderace-0.2.0-py3-none-any.whl
Upload date: Feb 23, 2026
Size: 22.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for coderace-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4860e5e3165efe3d6752d3ea79f70cdf1cc3eff60126ef143f96b6ddd0499cdd`
MD5	`727719a269d664d81b23fdfbf6a82b3c`
BLAKE2b-256	`c1b43131a1c2bef6f0fb34227f79be13ad88c1dfa04720491c84d6abd6ae6f92`

See more details on using hashes here.

coderace 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

coderace

Install

Quick Start

Task Format

What It Does

Scoring

Output

Try It Now

Statistical Mode

HTML Reports

Custom Scoring

Supported Agents

Parallel Mode

Why coderace?

Requirements

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes