Skip to main content

A framework for evaluating overthinking and basic reasoning capabilities of Large Language Models

Project description

llmthinkbench: LLM Reasoning Evaluation Framework

A framework for evaluating overthinking and basic reasoning capabilities of Large Language Models

Features

  • Modular architecture for easy addition of new evaluation tasks
  • Built-in tasks: sorting, number comparison
  • Detailed reporting and metrics
  • Efficient batched inference using vLLM

Installation

pip install llmthinkbench

Quick Start

# Run evaluation with default parameters
llmthinkbench --model_id "Qwen/Qwen2.5-1.5B-Instruct" --tasks sorting comparison

# Run with custom parameters
llmthinkbench --model_id "meta-llama/Llama-2-7b-chat-hf" \
  --tensor_parallel_size 2 \
  --gpu_memory_utilization 0.9 \
  --temperature 0.7 \
  --top_p 0.9 \
  --max_tokens 512 \
  --tasks sorting comparison \
  --datapoints 100 \
  --list_sizes 8 16 32 \
  --folds 3 \
  --range -100 100 \
  --store_details

Adding New Tasks

  1. Create a new task module in llmthinkbench/tasks/your_task.py
  2. Implement a class that inherits from BaseTask and implements required methods
  3. Register your task in llmthinkbench/tasks/__init__.py
  4. Run with --tasks your_task

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmthinkbench-0.1.0.tar.gz (11.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llmthinkbench-0.1.0-py3-none-any.whl (14.2 kB view details)

Uploaded Python 3

File details

Details for the file llmthinkbench-0.1.0.tar.gz.

File metadata

  • Download URL: llmthinkbench-0.1.0.tar.gz
  • Upload date:
  • Size: 11.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.7

File hashes

Hashes for llmthinkbench-0.1.0.tar.gz
Algorithm Hash digest
SHA256 18b35496f51e7b0b8b13574e366e4c8ab74ac77c451e4b78bffcff5060fe4576
MD5 59f4aacc687b2080d491c2da5eeae1bb
BLAKE2b-256 d36eed29f818142bd27355d6a24295d604695d7f1ae0e76f8f87810fb5a08ef7

See more details on using hashes here.

File details

Details for the file llmthinkbench-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: llmthinkbench-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 14.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.7

File hashes

Hashes for llmthinkbench-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f5282d3e20614001152f37f7dd7c7490831c147dcb6ff73e0e1b5ccd8578215b
MD5 804f2062f49696832814d7aee2ac3fbe
BLAKE2b-256 0e1618f1c00222f943e76ade96a8cc8d027fb119d61eff4c05420881fc2edfd7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page