Turn real GitHub issues into small, reproducible coding-agent benchmark tasks.
Project description
IssueBenchKit
Turn a real GitHub issue, pull request, or local bug into a small coding-agent benchmark task.
SWE-bench is great when you want a public leaderboard. Most teams need something smaller: a repeatable task built from the bugs they actually care about, with a clear test command and a report that says whether a candidate patch really fixed it.
IssueBenchKit is that local builder. It does not try to invent tests for you. It packages the issue context, base commit, reproduction command, and scoring result so you can evaluate coding agents on your own repositories.
Quick Start
pip install issuebenchkit
Create a benchmark task:
issuebench init tasks/qwen-copy \
--repo ./qwen-code \
--issue https://github.com/QwenLM/qwen-code/issues/4716 \
--base 8b4f3b2 \
--test "npm test -- copyCommand.test.ts"
Run the task against a candidate checkout:
issuebench run tasks/qwen-copy --repo ./candidate-qwen-code --out after.json
Compare before and after:
issuebench score tasks/qwen-copy --before before.json --after after.json
Export a report:
issuebench export tasks/qwen-copy --format html --out report.html
What It Stores
Each task directory contains one issuebench.json manifest:
- source repo path and optional GitHub issue URL
- base commit or version marker
- reproduction / validation command
- expected signal, notes, and tags
Run results are plain JSON files with exit code, duration, command, stdout tail, stderr tail, and the pass/fail verdict. They are easy to archive, diff, or attach to a PR.
Why Not Just Use SWE-bench?
Use SWE-bench for public comparison. Use IssueBenchKit when you need:
- a benchmark task for a private or small repo
- a tiny task that can run in CI
- a before/after report for one real bug
- a dataset of issues that reflects your own engineering workflow
Current Scope
The first version is intentionally small:
- generic shell test commands
- JSON manifest files
- before/after scoring
- JSONL and single-file HTML export
It does not generate tests automatically, mutate repositories, or claim that one command can evaluate every language ecosystem.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file issuebenchkit-0.1.0.tar.gz.
File metadata
- Download URL: issuebenchkit-0.1.0.tar.gz
- Upload date:
- Size: 9.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e16718584f69f1b8256ec75030d669b5f65626dc7b2413d15c21fb1ac5bb1de9
|
|
| MD5 |
4d20f41d73ce587ee115c3ca8d12430a
|
|
| BLAKE2b-256 |
a33e1c26e39d7ed611a5c1dc8b9fe00a33a835483e9c06ff00b2410702033493
|
Provenance
The following attestation bundles were made for issuebenchkit-0.1.0.tar.gz:
Publisher:
publish.yml on he-yufeng/IssueBenchKit
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
issuebenchkit-0.1.0.tar.gz -
Subject digest:
e16718584f69f1b8256ec75030d669b5f65626dc7b2413d15c21fb1ac5bb1de9 - Sigstore transparency entry: 1713462032
- Sigstore integration time:
-
Permalink:
he-yufeng/IssueBenchKit@617b71ba84b3d27f94060ee1ec3159ecd5e48149 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/he-yufeng
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@617b71ba84b3d27f94060ee1ec3159ecd5e48149 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file issuebenchkit-0.1.0-py3-none-any.whl.
File metadata
- Download URL: issuebenchkit-0.1.0-py3-none-any.whl
- Upload date:
- Size: 10.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2f5eea95d19b9b7f0887f620b42f19f3ecad75cdde2bf11e17df905038fc128a
|
|
| MD5 |
4edc525fe13a49fa42100ac823cdd640
|
|
| BLAKE2b-256 |
876b5d7ef923ebd6efa38c6bbe7b54d823d7bcf01879011815d2e3ba66e047eb
|
Provenance
The following attestation bundles were made for issuebenchkit-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on he-yufeng/IssueBenchKit
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
issuebenchkit-0.1.0-py3-none-any.whl -
Subject digest:
2f5eea95d19b9b7f0887f620b42f19f3ecad75cdde2bf11e17df905038fc128a - Sigstore transparency entry: 1713462092
- Sigstore integration time:
-
Permalink:
he-yufeng/IssueBenchKit@617b71ba84b3d27f94060ee1ec3159ecd5e48149 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/he-yufeng
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@617b71ba84b3d27f94060ee1ec3159ecd5e48149 -
Trigger Event:
workflow_dispatch
-
Statement type: