Comprehensive benchmark suite for semantic router vs direct vLLM evaluation across multiple reasoning datasets

These details have not been verified by PyPI

Project links

Project description

vLLM Semantic Router Benchmark Suite

A comprehensive benchmark suite for evaluating semantic router performance against direct vLLM across multiple reasoning datasets. Perfect for researchers and developers working on LLM routing, evaluation, and performance optimization.

🎯 Key Features

6 Major Reasoning Datasets: MMLU-Pro, ARC, GPQA, TruthfulQA, CommonsenseQA, HellaSwag
Router vs vLLM Comparison: Side-by-side performance evaluation
Multiple Evaluation Modes: NR (neutral), XC (explicit CoT), NR_REASONING (auto-reasoning)
Research-Ready Output: CSV files and publication-quality plots
Dataset-Agnostic Architecture: Easy to extend with new datasets
CLI Tools: Simple command-line interface for common operations

🚀 Quick Start

Installation

pip install vllm-semantic-router-bench

Basic Usage

# Quick test on MMLU dataset
vllm-semantic-router-bench test --dataset mmlu --samples 5

# Full comparison between router and vLLM
vllm-semantic-router-bench compare --dataset arc --samples 10

# List available datasets
vllm-semantic-router-bench list-datasets

# Run comprehensive multi-dataset benchmark
vllm-semantic-router-bench comprehensive

Python API

from vllm_semantic_router_bench import DatasetFactory, list_available_datasets

# Load a dataset
factory = DatasetFactory()
dataset = factory.create_dataset("mmlu")
questions, info = dataset.load_dataset(samples_per_category=10)

print(f"Loaded {len(questions)} questions from {info.name}")
print(f"Categories: {info.categories}")

📊 Supported Datasets

Dataset	Domain	Categories	Difficulty	CoT Support
MMLU-Pro	Academic Knowledge	57 subjects	Undergraduate	✅
ARC	Scientific Reasoning	Science	Grade School	❌
GPQA	Graduate Q&A	Graduate-level	Graduate	❌
TruthfulQA	Truthfulness	Truthfulness	Hard	❌
CommonsenseQA	Common Sense	Common Sense	Hard	❌
HellaSwag	Commonsense NLI	~50 activities	Moderate	❌

🔧 Advanced Usage

Custom Evaluation Script

import subprocess
import sys

# Run detailed benchmark with custom parameters
cmd = [
    "router-bench",  # Main benchmark script
    "--dataset", "mmlu",
    "--samples-per-category", "20", 
    "--run-router", "--router-models", "auto",
    "--run-vllm", "--vllm-models", "openai/gpt-oss-20b",
    "--vllm-exec-modes", "NR", "NR_REASONING",
    "--output-dir", "results/custom_test"
]

subprocess.run(cmd)

Plotting Results

# Generate plots from benchmark results
bench-plot --router-dir results/router_mmlu \
           --vllm-dir results/vllm_mmlu \
           --output-dir results/plots \
           --dataset-name "MMLU-Pro"

📈 Research Output

The benchmark generates research-ready outputs:

CSV Files: Detailed per-question results and aggregated metrics
Master CSV: Combined results across all test runs
Plots: Accuracy and token usage comparisons
Summary Reports: Markdown reports with key findings

Example Output Structure

results/
├── research_results_master.csv          # Main research data
├── comparison_20250115_143022/
│   ├── router_mmlu/
│   │   └── detailed_results.csv
│   ├── vllm_mmlu/  
│   │   └── detailed_results.csv
│   ├── plots/
│   │   ├── accuracy_comparison.png
│   │   └── token_usage_comparison.png
│   └── RESEARCH_SUMMARY.md

🛠️ Development

Local Installation

git clone https://github.com/vllm-project/semantic-router
cd semantic-router/bench
pip install -e ".[dev]"

Adding New Datasets

Create a new dataset implementation in dataset_implementations/
Inherit from DatasetInterface
Register in dataset_factory.py
Add tests and documentation

from vllm_semantic_router_bench import DatasetInterface, Question, DatasetInfo

class MyDataset(DatasetInterface):
    def load_dataset(self, **kwargs):
        # Implementation here
        pass
    
    def format_prompt(self, question, style="plain"):
        # Implementation here  
        pass

📋 Requirements

Python 3.8+
OpenAI API access (for model evaluation)
Hugging Face account (for dataset access)
4GB+ RAM (for larger datasets)

Dependencies

openai>=1.0.0 - OpenAI API client
datasets>=2.14.0 - Hugging Face datasets
pandas>=1.5.0 - Data manipulation
matplotlib>=3.5.0 - Plotting
seaborn>=0.11.0 - Advanced plotting
tqdm>=4.64.0 - Progress bars

🤝 Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

Common Contributions

Adding new datasets
Improving evaluation metrics
Enhancing visualization
Performance optimizations
Documentation improvements

📄 License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

🔗 Links

Documentation: https://vllm-semantic-router.com
GitHub: https://github.com/vllm-project/semantic-router
Issues: https://github.com/vllm-project/semantic-router/issues
PyPI: https://pypi.org/project/vllm-semantic-router-bench/

📞 Support

GitHub Issues: Bug reports and feature requests
Documentation: Comprehensive guides and API reference
Community: Join our discussions and get help from other users

Made with ❤️ by the vLLM Semantic Router Team

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.0

Sep 12, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vllm_semantic_router_bench-1.0.0.tar.gz (53.5 kB view details)

Uploaded Sep 12, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

vllm_semantic_router_bench-1.0.0-py3-none-any.whl (45.5 kB view details)

Uploaded Sep 12, 2025 Python 3

File details

Details for the file vllm_semantic_router_bench-1.0.0.tar.gz.

File metadata

Download URL: vllm_semantic_router_bench-1.0.0.tar.gz
Upload date: Sep 12, 2025
Size: 53.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for vllm_semantic_router_bench-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`3e11ce635573814d1aebbc6a9f087edfdd044754dd8bc44b33ed17ad5d845f29`
MD5	`04c17bc03a0a3c4cfd2dd15e730ebbc5`
BLAKE2b-256	`7530268500ec3ea185d761f17e20504e8e2c6fcaa7d556e018a14a0c85611899`

See more details on using hashes here.

File details

Details for the file vllm_semantic_router_bench-1.0.0-py3-none-any.whl.

File metadata

Download URL: vllm_semantic_router_bench-1.0.0-py3-none-any.whl
Upload date: Sep 12, 2025
Size: 45.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for vllm_semantic_router_bench-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7ab7d6ac106ab1169fa880ca86a19d9f680cdadcb546db437c1c5a33c4467b7d`
MD5	`24705d06f5ffb7aaa703188629e28902`
BLAKE2b-256	`114a28f3aa49c315f39919c9d7ae4be0ea07f8f8bf72e1972c070717a67a7e4a`

See more details on using hashes here.

vllm-semantic-router-bench 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

vLLM Semantic Router Benchmark Suite

🎯 Key Features

🚀 Quick Start

Installation

Basic Usage

Python API

📊 Supported Datasets

🔧 Advanced Usage

Custom Evaluation Script

Plotting Results

📈 Research Output

Example Output Structure

🛠️ Development

Local Installation

Adding New Datasets

📋 Requirements

Dependencies

🤝 Contributing

Common Contributions

📄 License

🔗 Links

📞 Support

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes