A framework for evaluating overthinking and basic reasoning capabilities of Large Language Models
Project description
llmthinkbench: LLM Reasoning Evaluation Framework
A framework for evaluating overthinking and basic reasoning capabilities of Large Language Models
Features
- Modular architecture for easy addition of new evaluation tasks
- Built-in tasks: sorting, number comparison
- Detailed reporting and metrics
- Efficient batched inference using vLLM
Installation
pip install llmthinkbench
Quick Start
# Run evaluation with default parameters
llmthinkbench --model_id "Qwen/Qwen2.5-1.5B-Instruct" --tasks sorting comparison
# Run with custom parameters
llmthinkbench --model_id "meta-llama/Llama-2-7b-chat-hf" \
--tensor_parallel_size 2 \
--gpu_memory_utilization 0.9 \
--temperature 0.7 \
--top_p 0.9 \
--max_tokens 512 \
--tasks sorting comparison \
--datapoints 100 \
--list_sizes 8 16 32 \
--folds 3 \
--range -100 100 \
--store_details
Adding New Tasks
- Create a new task module in
llmthinkbench/tasks/your_task.py - Implement a class that inherits from
BaseTaskand implements required methods - Register your task in
llmthinkbench/tasks/__init__.py - Run with
--tasks your_task
License
MIT License
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
llmthinkbench-0.1.0.tar.gz
(11.6 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llmthinkbench-0.1.0.tar.gz.
File metadata
- Download URL: llmthinkbench-0.1.0.tar.gz
- Upload date:
- Size: 11.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
18b35496f51e7b0b8b13574e366e4c8ab74ac77c451e4b78bffcff5060fe4576
|
|
| MD5 |
59f4aacc687b2080d491c2da5eeae1bb
|
|
| BLAKE2b-256 |
d36eed29f818142bd27355d6a24295d604695d7f1ae0e76f8f87810fb5a08ef7
|
File details
Details for the file llmthinkbench-0.1.0-py3-none-any.whl.
File metadata
- Download URL: llmthinkbench-0.1.0-py3-none-any.whl
- Upload date:
- Size: 14.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f5282d3e20614001152f37f7dd7c7490831c147dcb6ff73e0e1b5ccd8578215b
|
|
| MD5 |
804f2062f49696832814d7aee2ac3fbe
|
|
| BLAKE2b-256 |
0e1618f1c00222f943e76ade96a8cc8d027fb119d61eff4c05420881fc2edfd7
|