FeatureBench Pipeline - A test-driven data generation pipeline for building and evaluating feature-level coding benchmarks
Project description
FeatureBench is a test-driven data generation and evaluation pipeline for feature-level coding benchmarks. It provides a unified CLI to run inference, evaluation, and dataset generation.
📰 News
🎁 2026.02.06: We now support one-click inference for mainstream agent frameworks, including OpenHands, Claude Code, Codex, Gemini CLI, and mini-swe-agent. All supported agent frameworks can be found here. We have also open-sourced the FeatureBench data pipeline.
🚀 Quickstart
Prerequisites:
# pypi
pip install featurebench
# or uv add featurebench
# local
git clone https://github.com/LiberCoders/FeatureBench.git
cd FeatureBench
uv sync
Configure:
cp config_example.toml config.toml
See docs/config.md for a comprehensive reference (harness, infer, data pipeline) with examples.
Optional: pre-pull images to reduce network variance:
fb pull --mode lite # lite split image list (13 images)
fb pull --mode full # full split image list (24 images)
fb pull --mode /path/to/images.txt # one image name per line
# full list: featurebench/resources/constants/full_images.txt
# lite list: featurebench/resources/constants/lite_images.txt
Run inference:
fb infer \
--config-path config.toml \
--agent mini_swe_agent \
--model openai/qwen3-coder-480b-a35b-instruct \
--split lite
Run evaluation:
fb eval \
-p runs/<timestamp>/output.jsonl \
--split lite
🧭 CLI Overview
fb provides three core commands:
fb inferrunsfeaturebench.infer.run_infer(docs: docs/infer_cli_arg.md)fb evalrunsfeaturebench.harness.run_evaluation(docs: docs/harness_cli_arg.md)fb datarunsfeaturebench.pipeline(docs: docs/pipeline.md)
✍️ Citation
If you found FeatureBench useful, please cite us as:
@misc{zhou2026featurebenchbenchmarkingagenticcoding,
title={FeatureBench: Benchmarking Agentic Coding for Complex Feature Development},
author={Qixing Zhou and Jiacheng Zhang and Haiyang Wang and Rui Hao and Jiahe Wang and Minghao Han and Yuxue Yang and Shuzhe Wu and Feiyang Pan and Lue Fan and Dandan Tu and Zhaoxiang Zhang},
year={2026},
eprint={2602.10975},
archivePrefix={arXiv},
primaryClass={cs.SE},
url={https://arxiv.org/abs/2602.10975},
}
📧 Contact
If you have any questions, feel free to contact qixingzhou1125@gmail.com or zjcheng2022@gmail.com.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file featurebench-0.1.0.tar.gz.
File metadata
- Download URL: featurebench-0.1.0.tar.gz
- Upload date:
- Size: 1.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ff181abc503906ea4ff15aa1636db17ed7aab0f3034027516eeea8e2f3cc9e6c
|
|
| MD5 |
fbdc65cb17e0331d190113f2d445f50b
|
|
| BLAKE2b-256 |
378eb3fc8c56a56f06c160c775fdadd99c5e3b8c97181051fd5034301e4791d0
|
File details
Details for the file featurebench-0.1.0-py3-none-any.whl.
File metadata
- Download URL: featurebench-0.1.0-py3-none-any.whl
- Upload date:
- Size: 317.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f466bde01d8dc515c779adb04ec5232026f3b7dbd3b49e6059097747e6f8fd02
|
|
| MD5 |
b2c45dacfed32b91615aa046da68d4ad
|
|
| BLAKE2b-256 |
0931997f002d99955016c6e3d8f9939001c83bed203731b800184ef9ade507b4
|