yourbench

Dynamic Evaluation Set Generation with Large Language Models

Project description

YourBench: A Dynamic Benchmark Generation Framework

[GitHub] · [Dataset] · [Documentation] · [Paper]

Yourbench is a structured data generation library for building better AI systems. Generate high-quality QA pairs, training data, and evaluation datasets from any source documents with full control over the output format and complexity. The modular architecture lets you configure every aspect of the generation pipeline, from document parsing (with built-in converters for common formats to markdown) to chunking strategies to output schemas. Most eval frameworks force you into their structure; Yourbench adapts to yours. Use it to create domain-specific benchmarks, fine-tuning datasets, or systematic model evaluations. Peer-reviewed and appearing at COLM 2025. 100% free and open source, forever.

Quick Start

You can use yourbench without installation instantly with uv! Simply run:

uvx yourbench --model gpt-4o-mini <YOUR_FILE_DIRECTORY_HERE>

You will see the dataset appear locally! If a valid HF_TOKEN is set, you will also see the dataset appear on your Hugging Face Hub!

Installation

YourBench is available on PyPI and requires Python 3.12+. You can install it as follows:

Install via PyPI (stable release):

# uv (recommended; get it here: https://docs.astral.sh/uv/getting-started/installation/)
uv venv --python 3.12
source .venv/bin/activate
uv pip install yourbench

# pip (standard support)
pip install yourbench

This will install the latest published version (e.g. 0.4.1).

Install from source (development version):

git clone https://github.com/huggingface/yourbench.git
cd yourbench

# uv, recommended
uv venv --python 3.12
source .venv/bin/activate
uv pip install -e .

# pip
pip install -e .

Installing from source is recommended if you want the latest updates or to run the included example configuration.

Note: If you plan to use models that require API access (e.g. OpenAI GPT-4o or Hugging Face Inference API), make sure to have the appropriate credentials. You’ll also need a Hugging Face token (to optionally to upload results). See below for how to configure these before running YourBench.

Usage

Once installed, YourBench can be run from the command line to generate a custom evaluation set. Here’s a quick example:

# 1. (Optional) If not done already, install YourBench
pip install yourbench

# 2. Prepare your API credentials (for model inference and Hub access)
# For example, create a .env file with required keys:
# echo "OPENROUTER_API_KEY=<your_openrouter_api_key>" >> .env        # Example
echo "HF_TOKEN=<your_huggingface_api_token>" >> .env              # Hugging Face token (for Hub datasets & inference)

# 3. Run the pipeline on the provided example config (uses sample docs and models), or, use your own config file!
yourbench example/configs/simple_example.yaml

The example configuration example/configs/simple_example.yaml (included in the repository) demonstrates a basic setup. It specifies sample documents and default models for each stage of the pipeline. In step 3 above, YourBench will automatically ingest the example documents, generate a set of Q&A pairs, and output a Hugging Face Dataset containing the evaluation questions and answers.

For your own data, you can create a YAML config pointing to your documents and preferred models. For instance, you might specify a folder of PDFs or text files under a documents field, and choose which LLM to use for question generation. YourBench is fully configurable – you can easily toggle stages on or off and swap in different models. For example: you could disable the summarization stage for very short texts, or use a powerful, large, API model for question generation while using a faster local model for summarization. The possibilities are endless! Simply adjust the YAML, and the pipeline will accommodate it. (See the usage example for all available options!)

You may be interested in How YourBench Works

Try it Online (Hugging Face Spaces)

You can try YourBench right away in your browser – no installation needed:

YourBench Demo Space – Use our ready-to-go web demo to upload a document (or paste text) and generate a custom evaluation set with one click, complete with an instant model leaderboard. (This free demo will use a default set of models to answer the questions and show how different models perform.)
YourBench Advanced Space – For power users, the advanced demo lets you provide a custom YAML config and plug in your own models or API endpoints. This gives you full control over the pipeline (choose specific models, adjust chunking parameters, etc.) via a convenient UI, right from the browser.

👉 Both hosted apps are available on Hugging Face Spaces under the yourbench organization. Give them a try to see how YourBench can generate benchmarks tailored to your use-case in minutes.

Contributing

Contributions are welcome!

We actively review PRs and welcome improvements or fixes from the community. For major changes, feel free to open an issue first to discuss the idea.

📈 Progress

📜 License

This project is licensed under the Apache 2.0 License – see the LICENSE file for details. You are free to use, modify, and distribute YourBench in either commercial or academic projects under the terms of this license.

📚 Citation

If you use YourBench in your research or applications, please consider citing our paper:

@misc{shashidhar2025yourbencheasycustomevaluation,
      title={YourBench: Easy Custom Evaluation Sets for Everyone},
      author={Sumuk Shashidhar and Clémentine Fourrier and Alina Lozovskia and Thomas Wolf and Gokhan Tur and Dilek Hakkani-Tür},
      year={2025},
      eprint={2504.01833},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2504.01833}
}

Project details

Release history Release notifications | RSS feed

0.9.0

Dec 29, 2025

0.6.0

Aug 5, 2025

0.5.3

Aug 5, 2025

0.5.2

Aug 5, 2025

This version

0.5.1

Aug 5, 2025

0.5.0

Aug 5, 2025

0.4.3

Aug 5, 2025

0.4.1

Aug 4, 2025

0.4.0

Jul 31, 2025

0.3.1

May 16, 2025

0.3.0

May 5, 2025

0.2.0

Mar 21, 2025

0.1.0

Mar 20, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yourbench-0.5.1.tar.gz (92.1 kB view details)

Uploaded Aug 5, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

yourbench-0.5.1-py3-none-any.whl (105.9 kB view details)

Uploaded Aug 5, 2025 Python 3

File details

Details for the file yourbench-0.5.1.tar.gz.

File metadata

Download URL: yourbench-0.5.1.tar.gz
Upload date: Aug 5, 2025
Size: 92.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for yourbench-0.5.1.tar.gz
Algorithm	Hash digest
SHA256	`1fa3af5171aa25a2855bf3fda53723d3f71711dc084061ea57925337769f9977`
MD5	`c567853bf788c3cbc61e2d974cecc474`
BLAKE2b-256	`f65bd794dbbc75aa8dce9eebcb05ea6516af87c277e219b58f4d6fb76ccbaa93`

See more details on using hashes here.

Provenance

The following attestation bundles were made for yourbench-0.5.1.tar.gz:

Publisher: python-publish.yml on huggingface/yourbench

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: yourbench-0.5.1.tar.gz
- Subject digest: 1fa3af5171aa25a2855bf3fda53723d3f71711dc084061ea57925337769f9977
- Sigstore transparency entry: 352778410
- Sigstore integration time: Aug 5, 2025
Source repository:
- Permalink: huggingface/yourbench@5781dd1cafbdef42dd05c072985fcaf02d219152
- Branch / Tag: refs/tags/v0.5.1
- Owner: https://github.com/huggingface
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@5781dd1cafbdef42dd05c072985fcaf02d219152
- Trigger Event: release

File details

Details for the file yourbench-0.5.1-py3-none-any.whl.

File metadata

Download URL: yourbench-0.5.1-py3-none-any.whl
Upload date: Aug 5, 2025
Size: 105.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for yourbench-0.5.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e46233b509f65edd042f4c0348aceba2a895ee66c7e721286ded82e20d60a782`
MD5	`84c2d05beaf80c70c159232b0596f6ab`
BLAKE2b-256	`73af12290215bbfeef616c7461e983d3741babb0289737a1cb56737b0611afe8`

See more details on using hashes here.

Provenance

The following attestation bundles were made for yourbench-0.5.1-py3-none-any.whl:

Publisher: python-publish.yml on huggingface/yourbench

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: yourbench-0.5.1-py3-none-any.whl
- Subject digest: e46233b509f65edd042f4c0348aceba2a895ee66c7e721286ded82e20d60a782
- Sigstore transparency entry: 352778411
- Sigstore integration time: Aug 5, 2025
Source repository:
- Permalink: huggingface/yourbench@5781dd1cafbdef42dd05c072985fcaf02d219152
- Branch / Tag: refs/tags/v0.5.1
- Owner: https://github.com/huggingface
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@5781dd1cafbdef42dd05c072985fcaf02d219152
- Trigger Event: release

yourbench 0.5.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

YourBench: A Dynamic Benchmark Generation Framework

Quick Start

Installation

Usage

Try it Online (Hugging Face Spaces)

Contributing

📈 Progress

📜 License

📚 Citation

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance