yourbench

Dynamic Evaluation Set Generation with Large Language Models

Project description

YourBench: A Dynamic Benchmark Generation Framework

[GitHub] · [Dataset] · [Documentation] · [Paper]

Generate high-quality QA pairs and evaluation datasets from any source documents. YourBench transforms your PDFs, Word docs, and text files into structured benchmark datasets with configurable output formats. Appearing at COLM 2025. 100% free and open source.

Features

Document Ingestion – Parse PDFs, Word docs, HTML, and text files into standardized Markdown
Question Generation – Create single-hop and multi-hop questions with customizable schemas
Custom Output Schemas – Define your own Pydantic models for question/answer format
Multi-Model Support – Use different LLMs for different pipeline stages
HuggingFace Integration – Push datasets directly to the Hub or save locally
Quality Filtering – Citation scoring and deduplication built-in

Quick Start

Use uv to run the packaged CLI directly:

uvx --from yourbench yourbench run example/default_example/config.yaml --debug

The example config works out-of-the-box with env vars from .env (see .env.template).

Install locally if you prefer:

uv pip install yourbench
yourbench run example/default_example/config.yaml

Installation

Requires Python 3.12+.

# With uv (recommended)
uv pip install yourbench

# With pip
pip install yourbench

From source:

git clone https://github.com/huggingface/yourbench.git
cd yourbench
pip install -e .

Usage

Minimal config:

hf_configuration:
  hf_dataset_name: my-benchmark

model_list:
  - model_name: openai/gpt-4o-mini
    api_key: $OPENAI_API_KEY

pipeline:
  ingestion:
    source_documents_dir: ./my-documents
  summarization:
  chunking:
  single_hop_question_generation:
  prepare_lighteval:

yourbench run config.yaml

With custom output schema:

pipeline:
  single_hop_question_generation:
    question_schema: ./my_schema.py  # Must export DataFormat class

# my_schema.py
from pydantic import BaseModel, Field

class DataFormat(BaseModel):
    question: str = Field(description="The question")
    answer: str = Field(description="The answer")
    difficulty: str = Field(description="easy, medium, or hard")

CLI Commands

YourBench provides several CLI commands:

Command	Description
`yourbench run <config>`	Run the full pipeline
`yourbench validate <config>`	Check config without running
`yourbench estimate <config>`	Estimate token usage
`yourbench init`	Generate starter config interactively
`yourbench stages`	List available pipeline stages
`yourbench version`	Show version

See CLI Reference for full documentation.

Documentation

Guide	Description
Configuration	Full config reference with all options
Custom Schemas	Define your own output formats
How It Works	Pipeline architecture and stages
CLI Reference	All CLI commands and options
FAQ	Common questions and troubleshooting
OpenAI-Compatible Models	Use vLLM, Ollama, etc.
Dataset Columns	Output field descriptions
Academic Paper	COLM 2025 submission

Try Online

No installation needed:

Demo Space – Upload a document, get a benchmark
Advanced Space – Full config control in browser

Example Configs

The example/ folder contains ready-to-use configurations:

default_example/ – Basic setup with sample documents
harry_potter_quizz/ – Generate quiz questions from books
custom_prompts_demo/ – Custom prompts for domain-specific questions
local_vllm_private_data/ – Use local models for private data
rich_pdf_extraction_with_gemini/ – LLM-based PDF extraction for charts/figures

Run any example:

yourbench run example/default_example/config.yaml

API Keys

Set in environment or .env file:

HF_TOKEN=hf_xxx              # For Hub upload
OPENAI_API_KEY=sk-xxx        # For OpenAI models

Use $VAR_NAME in config to reference environment variables.

Contributing

PRs welcome! Open an issue first for major changes.

📈 Progress

📜 License

Apache 2.0 – see LICENSE.

📚 Citation

@misc{shashidhar2025yourbencheasycustomevaluation,
      title={YourBench: Easy Custom Evaluation Sets for Everyone},
      author={Sumuk Shashidhar and Clémentine Fourrier and Alina Lozovskia and Thomas Wolf and Gokhan Tur and Dilek Hakkani-Tür},
      year={2025},
      eprint={2504.01833},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2504.01833}
}

Project details

Release history Release notifications | RSS feed

This version

0.9.0

Dec 29, 2025

0.6.0

Aug 5, 2025

0.5.3

Aug 5, 2025

0.5.2

Aug 5, 2025

0.5.1

Aug 5, 2025

0.5.0

Aug 5, 2025

0.4.3

Aug 5, 2025

0.4.1

Aug 4, 2025

0.4.0

Jul 31, 2025

0.3.1

May 16, 2025

0.3.0

May 5, 2025

0.2.0

Mar 21, 2025

0.1.0

Mar 20, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yourbench-0.9.0.tar.gz (79.1 kB view details)

Uploaded Dec 29, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

yourbench-0.9.0-py3-none-any.whl (98.1 kB view details)

Uploaded Dec 29, 2025 Python 3

File details

Details for the file yourbench-0.9.0.tar.gz.

File metadata

Download URL: yourbench-0.9.0.tar.gz
Upload date: Dec 29, 2025
Size: 79.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for yourbench-0.9.0.tar.gz
Algorithm	Hash digest
SHA256	`b79255cbed3936e2cf448229abc68582adf245844dfbad311299813facc2a261`
MD5	`1190109e230872afd0cb403404017d1d`
BLAKE2b-256	`4ae9e16040b6be9286290e5da377d725368dc668b370d4a396dc432e096b968c`

See more details on using hashes here.

Provenance

The following attestation bundles were made for yourbench-0.9.0.tar.gz:

Publisher: python-publish.yml on huggingface/yourbench

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: yourbench-0.9.0.tar.gz
- Subject digest: b79255cbed3936e2cf448229abc68582adf245844dfbad311299813facc2a261
- Sigstore transparency entry: 781171290
- Sigstore integration time: Dec 29, 2025
Source repository:
- Permalink: huggingface/yourbench@46807647f99bd954747257a3db6a31a12f50b820
- Branch / Tag: refs/tags/v0.9
- Owner: https://github.com/huggingface
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@46807647f99bd954747257a3db6a31a12f50b820
- Trigger Event: release

File details

Details for the file yourbench-0.9.0-py3-none-any.whl.

File metadata

Download URL: yourbench-0.9.0-py3-none-any.whl
Upload date: Dec 29, 2025
Size: 98.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for yourbench-0.9.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f1d51cd8f2f07c902abbae690a8ab47e62271dd7879926db4bd7cb33729352ce`
MD5	`17e7dd201336afe331ee6b8040117fb0`
BLAKE2b-256	`25dd106d779c73f2d73b43acff7ba0fed218509b949e6f762fcf46d123a6ef36`

See more details on using hashes here.

Provenance

The following attestation bundles were made for yourbench-0.9.0-py3-none-any.whl:

Publisher: python-publish.yml on huggingface/yourbench

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: yourbench-0.9.0-py3-none-any.whl
- Subject digest: f1d51cd8f2f07c902abbae690a8ab47e62271dd7879926db4bd7cb33729352ce
- Sigstore transparency entry: 781171295
- Sigstore integration time: Dec 29, 2025
Source repository:
- Permalink: huggingface/yourbench@46807647f99bd954747257a3db6a31a12f50b820
- Branch / Tag: refs/tags/v0.9
- Owner: https://github.com/huggingface
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@46807647f99bd954747257a3db6a31a12f50b820
- Trigger Event: release

yourbench 0.9.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

YourBench: A Dynamic Benchmark Generation Framework

Features

Quick Start

Installation

Usage

CLI Commands

Documentation

Try Online

Example Configs

API Keys

Contributing

📈 Progress

📜 License

📚 Citation

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance