Turn real GitHub issues into small, reproducible coding-agent benchmark tasks.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

heyufeng

These details have not been verified by PyPI

Project description

中文文档

IssueBenchKit

Turn a real GitHub issue, pull request, or local bug into a small coding-agent benchmark task.

SWE-bench is great when you want a public leaderboard. Most teams need something smaller: a repeatable task built from the bugs they actually care about, with a clear test command and a report that says whether a candidate patch really fixed it.

IssueBenchKit is that local builder. It does not try to invent tests for you. It packages the issue context, base commit, reproduction command, and scoring result so you can evaluate coding agents on your own repositories.

Quick Start

pip install issuebenchkit

Create a benchmark task:

issuebench init tasks/qwen-copy \
  --repo ./qwen-code \
  --issue https://github.com/QwenLM/qwen-code/issues/4716 \
  --base 8b4f3b2 \
  --test "npm test -- copyCommand.test.ts"

Run the task against a candidate checkout:

issuebench run tasks/qwen-copy --repo ./candidate-qwen-code --out after.json

Compare before and after:

issuebench score tasks/qwen-copy --before before.json --after after.json

Export a report:

issuebench export tasks/qwen-copy --format html --out report.html

What It Stores

Each task directory contains one issuebench.json manifest:

source repo path and optional GitHub issue URL
base commit or version marker
reproduction / validation command
expected signal, notes, and tags

Run results are plain JSON files with exit code, duration, command, stdout tail, stderr tail, and the pass/fail verdict. They are easy to archive, diff, or attach to a PR.

Why Not Just Use SWE-bench?

Use SWE-bench for public comparison. Use IssueBenchKit when you need:

a benchmark task for a private or small repo
a tiny task that can run in CI
a before/after report for one real bug
a dataset of issues that reflects your own engineering workflow

Current Scope

The first version is intentionally small:

generic shell test commands
JSON manifest files
before/after scoring
JSONL and single-file HTML export

It does not generate tests automatically, mutate repositories, or claim that one command can evaluate every language ecosystem.

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

heyufeng

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Jun 4, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

issuebenchkit-0.1.0.tar.gz (9.7 kB view details)

Uploaded Jun 4, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

issuebenchkit-0.1.0-py3-none-any.whl (10.0 kB view details)

Uploaded Jun 4, 2026 Python 3

File details

Details for the file issuebenchkit-0.1.0.tar.gz.

File metadata

Download URL: issuebenchkit-0.1.0.tar.gz
Upload date: Jun 4, 2026
Size: 9.7 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for issuebenchkit-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`e16718584f69f1b8256ec75030d669b5f65626dc7b2413d15c21fb1ac5bb1de9`
MD5	`4d20f41d73ce587ee115c3ca8d12430a`
BLAKE2b-256	`a33e1c26e39d7ed611a5c1dc8b9fe00a33a835483e9c06ff00b2410702033493`

See more details on using hashes here.

Provenance

The following attestation bundles were made for issuebenchkit-0.1.0.tar.gz:

Publisher: publish.yml on he-yufeng/IssueBenchKit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: issuebenchkit-0.1.0.tar.gz
- Subject digest: e16718584f69f1b8256ec75030d669b5f65626dc7b2413d15c21fb1ac5bb1de9
- Sigstore transparency entry: 1713462032
- Sigstore integration time: Jun 4, 2026
Source repository:
- Permalink: he-yufeng/IssueBenchKit@617b71ba84b3d27f94060ee1ec3159ecd5e48149
- Branch / Tag: refs/heads/main
- Owner: https://github.com/he-yufeng
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@617b71ba84b3d27f94060ee1ec3159ecd5e48149
- Trigger Event: workflow_dispatch

File details

Details for the file issuebenchkit-0.1.0-py3-none-any.whl.

File metadata

Download URL: issuebenchkit-0.1.0-py3-none-any.whl
Upload date: Jun 4, 2026
Size: 10.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for issuebenchkit-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2f5eea95d19b9b7f0887f620b42f19f3ecad75cdde2bf11e17df905038fc128a`
MD5	`4edc525fe13a49fa42100ac823cdd640`
BLAKE2b-256	`876b5d7ef923ebd6efa38c6bbe7b54d823d7bcf01879011815d2e3ba66e047eb`

See more details on using hashes here.

Provenance

The following attestation bundles were made for issuebenchkit-0.1.0-py3-none-any.whl:

Publisher: publish.yml on he-yufeng/IssueBenchKit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: issuebenchkit-0.1.0-py3-none-any.whl
- Subject digest: 2f5eea95d19b9b7f0887f620b42f19f3ecad75cdde2bf11e17df905038fc128a
- Sigstore transparency entry: 1713462092
- Sigstore integration time: Jun 4, 2026
Source repository:
- Permalink: he-yufeng/IssueBenchKit@617b71ba84b3d27f94060ee1ec3159ecd5e48149
- Branch / Tag: refs/heads/main
- Owner: https://github.com/he-yufeng
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@617b71ba84b3d27f94060ee1ec3159ecd5e48149
- Trigger Event: workflow_dispatch

issuebenchkit 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

IssueBenchKit

Quick Start

What It Stores

Why Not Just Use SWE-bench?

Current Scope

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance