Skip to main content

dunebench – a lightweight evaluation tool for llama.cpp models

Project description

DuneBench

dunebench is a lightweight, local benchmarking tool for GGUF models. It allows you to evaluate Large Language Models (LLMs) across a variety of domains—including logic, coding, math, and common sense—using llama-cpp-python.

Logo

Installation

Install dunebench with pip

    pip install dunebench

or install EXE with this link

Features

  • Local Evaluation: Runs entirely on your machine using GGUF models.
  • GPU Accelerated: Offload layers to your GPU for faster testing.
  • Multi-Domain Support: Includes 8 distinct benchmarks (Math, Coding, Science, etc.).

Usage/Examples

python -m dunebench --model "path/to/model.gguf" --task science --limit 20

Arguments

Argument Description Default
--model Path to your .gguf model file Required
--task The benchmark task to run Required
--limit Number of samples to test 10

Tasks

Task Name Dataset Used Domain Type
science ai2_arc (Challenge) Scientific Reasoning Multiple Choice
math gsm8k Math Word Problems Generation
programming mbpp (Sanitized) Python Coding Code Generation
physical_logic piqa Physical Commonsense Multiple Choice
common_sense openbookqa General Knowledge Multiple Choice
logic winogrande Ambiguity Resolution Multiple Choice
grammar glue (CoLA) Linguistic Acceptability Multiple Choice
nlp hellaswag Sentence Completion Multiple Choice

License

MIT

Authors

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dunebench-0.4.tar.gz (6.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dunebench-0.4-py3-none-any.whl (6.7 kB view details)

Uploaded Python 3

File details

Details for the file dunebench-0.4.tar.gz.

File metadata

  • Download URL: dunebench-0.4.tar.gz
  • Upload date:
  • Size: 6.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for dunebench-0.4.tar.gz
Algorithm Hash digest
SHA256 fc2321abb0616a6234a353b635829801ba62b8c736236bbfa49733e573bfdc3b
MD5 b2d6fe512aee17364f269ebd8f146489
BLAKE2b-256 9fc6f68d9e7574c81120c89ba02c742a92fec1019c9492a27d6c799cb0d24ffb

See more details on using hashes here.

File details

Details for the file dunebench-0.4-py3-none-any.whl.

File metadata

  • Download URL: dunebench-0.4-py3-none-any.whl
  • Upload date:
  • Size: 6.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for dunebench-0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 2825a82285b70dfcf9712a1210a9bb17b8b03729050eaa33f3bbfe61dbaee9a8
MD5 7681eb484cb05253adcec22db36f71dd
BLAKE2b-256 889c8549d067ec6f51f1e67003c499c2f3d8363e782a82afd707da62981db2c1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page