Dynamic Evaluation Set Generation with LLMs
Project description
🤗 Yourbench
Dynamic Evaluation Set Generation for LLM Benchmarking [NAACL '25]
🌟 Overview
Yourbench is a powerful framework for dynamically generating evaluation sets from source documents. It addresses the limitations of static benchmarks and benchmark saturation by creating diverse, contextually-rich questions tailored to specific educational levels.
🔄 Process Flow
✨ Features
- 🔄 Dynamic Generation: Create evaluation sets on-the-fly from any source documents
- 📚 Semantic Chunking: Smart document splitting that maintains context and meaning
- 🤔 Multi-hop Questions: Generate questions that require synthesizing information across document sections
- 📊 Configurable Difficulty: Tailor questions to specific educational levels
- 🔍 Diverse Question Types: Support for 10 different question types
- 🤖 Model Flexibility: Works with OpenAI and Azure OpenAI models via LiteLLM
- 📦 Hugging Face Integration: Direct dataset publishing to Hugging Face Hub
🛠️ Requirements
- Python 3.12+
- LiteLLM for model inference
- Sentence Transformers for semantic chunking
- Hugging Face Datasets for dataset management
- OpenAI API Compatible API / Azure AI. (more model types coming soon!)
📦 Installation
# Create a virtual environment (recommended)
python -m venv venv
source venv/bin/activate # Linux/Mac
# or
.\venv\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txt
🚀 Quick Start
- Set up your environment:
# For OpenAI / OpenAI compatible APIs
export MODEL_BASE_URL=your_openai_url
export MODEL_API_KEY=your_openai_key
# For Azure OpenAI
export AZURE_BASE_URL=your_azure_url
export AZURE_API_KEY=your_azure_key
-
Create a task configuration (
config.yaml). Here is some more information!. You can also look at an example task configuration -
Run the example task (after setting your 🤗 username / organization in the config!):
python yourbench/main.py --task-name yourbench_y1
📚 Documentation
Detailed documentation is available in the docs directory:
- Configuration Guide: Comprehensive guide to YAML configuration
- Question Generation: Details about the question generation process
- Chunking System: Information about the semantic chunking system
🏗️ Pipeline Components
1. Dataset Generation
- Processes source documents
- Creates structured datasets
- Supports local files and Hugging Face datasets
2. Document Summarization
- Generates document summaries
- Provides context for question generation
- Uses configured language model
3. Semantic Chunking
- Splits documents intelligently
- Maintains semantic coherence
- Configurable chunk sizes and overlap
4. Multi-hop Chunk Creation
- Pairs related document chunks
- Enables complex reasoning questions
- Smart chunk selection
5. Question Generation
- Single-shot questions from individual chunks
- Multi-hop questions from chunk pairs
- 10 different question types
- Difficulty calibration
- Educational level targeting
6. Dataset Management
- Hugging Face integration
- Local storage options
- Dataset versioning
🎯 Question Types
- Analytical: Break down complex ideas
- Application-based: Apply concepts to scenarios
- Clarification: Deep dive into specifics
- Counterfactual: Explore alternatives
- Conceptual: Examine theories
- True-false: Verify understanding
- Factual: Test recall
- Open-ended: Encourage discussion
- False-premise: Correct misconceptions
- Edge-case: Test boundaries
⚙️ Configuration
Example configuration:
task_name: yourbench_y1
configurations:
push_to_huggingface: true
set_hf_repo_visibility: public
hf_organization: your-org
model:
model_name: gpt-4
model_type: openai
max_concurrent_requests: 512
selected_choices:
generate_dataset:
execute: true
files_directory: examples/data
dataset_name: my_dataset
See Configuration Guide for detailed options.
🧰 Development
We use:
🤝 Contributing
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Install development dependencies
- Make your changes
- Run tests and ensure code style compliance
- Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
🙏 Acknowledgments
- LiteLLM for model inference
- Sentence Transformers for semantic embeddings
- Hugging Face for dataset infrastructure
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file yourbench-0.1.0.tar.gz.
File metadata
- Download URL: yourbench-0.1.0.tar.gz
- Upload date:
- Size: 33.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
11d4e6d811f628fc0cc38a8899448c5926fb41af6273e091a3623d1080468021
|
|
| MD5 |
101e170626c0f58f00aa6d40847e134d
|
|
| BLAKE2b-256 |
804ae9ebd5460447059ebb857f6b497f264df77df222a79eb995426cecbd5ec5
|
Provenance
The following attestation bundles were made for yourbench-0.1.0.tar.gz:
Publisher:
python-publish.yml on huggingface/yourbench
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
yourbench-0.1.0.tar.gz -
Subject digest:
11d4e6d811f628fc0cc38a8899448c5926fb41af6273e091a3623d1080468021 - Sigstore transparency entry: 185575430
- Sigstore integration time:
-
Permalink:
huggingface/yourbench@40ab6b3533a9f655e3d8953c5750425665caef92 -
Branch / Tag:
refs/tags/v0.0.0 - Owner: https://github.com/huggingface
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@40ab6b3533a9f655e3d8953c5750425665caef92 -
Trigger Event:
release
-
Statement type:
File details
Details for the file yourbench-0.1.0-py3-none-any.whl.
File metadata
- Download URL: yourbench-0.1.0-py3-none-any.whl
- Upload date:
- Size: 37.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7eaae9555fd9e2a21b7a0caa69a43925bd78bf545f8fd7b026b94dc8ddaa378e
|
|
| MD5 |
4043b46ced89cd3feae481cca696ec91
|
|
| BLAKE2b-256 |
e790ce88060081b851e837e907a0e95fe20e4660a2ead2d947f0f45edc660ecf
|
Provenance
The following attestation bundles were made for yourbench-0.1.0-py3-none-any.whl:
Publisher:
python-publish.yml on huggingface/yourbench
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
yourbench-0.1.0-py3-none-any.whl -
Subject digest:
7eaae9555fd9e2a21b7a0caa69a43925bd78bf545f8fd7b026b94dc8ddaa378e - Sigstore transparency entry: 185575433
- Sigstore integration time:
-
Permalink:
huggingface/yourbench@40ab6b3533a9f655e3d8953c5750425665caef92 -
Branch / Tag:
refs/tags/v0.0.0 - Owner: https://github.com/huggingface
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@40ab6b3533a9f655e3d8953c5750425665caef92 -
Trigger Event:
release
-
Statement type: