Skip to main content

Benchmark your local LLMs.

Project description

🧮 Benchllama

An open-source tool to benchmark you local LLMs.

License: MIT PyPI Supported Versions GitHub: Downloads Discord

🔑 Key points

Benchllama helps with benchmarking your local LLMs. Currently, it only supports benchmarking models served via Ollama. By default, it pulls bigcode/humanevalpack from HuggingFace. There is an out of box support for evaluating code autocompletion models (you need to use --eval flag for triggering this). Currently, it supports the following languages: Python, JavaScript, Java, Go, C++. You can also bring your own dataset (see this example for helping you with creating one) by specifying the path to it in --dataset flag.

📜 Background

With the explosion of open source LLMs and toolbox to further customize these models like Modelfiles, Mergekit, LoRA etc, it can be daunting to end users to choose the right LLM. From our experience with running local LLMs, the two key metrics that matter are performance and quality of responses. We created a simple CLI tool that enables the users to pick right LLM by evaluating them across these two parameters.

Given our experience in coding LLMs, we felt it would be useful to add out-of-box support for calculating pass@k for autocompletion models. In case, if you are into coding LLMs, please checkout our related project i.e Privy (github repo, vscode link, openvsx link).

✨ Features

  • Evaluate: Evaluate the performance of your models on various tasks, such as code generation.
  • Clean: Clean up temporary files generated by Benchllama.

🚀 Installation

$ pip install benchllama

⚙️ Usage

$ benchllama [OPTIONS] COMMAND [ARGS]...

Options:

  • --install-completion: Install completion for the current shell.
  • --show-completion: Show completion for the current shell, to copy it or customize the installation.
  • --help: Show this message and exit.

Commands:

  • evaluate
  • clean

benchllama evaluate

Usage:

$ benchllama evaluate [OPTIONS]

Options:

  • --models TEXT: Names of models that need to be evaluated. [required]
  • --provider-url TEXT: The endpoint of the model provider. [default: http://localhost:11434]
  • --dataset FILE: By default, bigcode/humanevalpack from Hugging Face will be used. If you want to use your own dataset, specify the path here.
  • --languages [python|js|java|go|cpp]: List of languages to evaluate from bigcode/humanevalpack. Ignore this if you are brining your own data [default: Language.python]
  • --num-completions INTEGER: Number of completions to be generated for each task. [default: 3]
  • --no-eval / --eval: If true, evaluation will be done [default: no-eval]
  • --k INTEGER: The k for calculating pass@k. The values shouldn't exceed num_completions [default: 1, 2]
  • --samples INTEGER: Number of dataset samples to evaluate. By default, all the samples get processed. [default: -1]
  • --output PATH: Output directory [default: /tmp]
  • --help: Show this message and exit.

benchllama clean

Usage:

$ benchllama clean [OPTIONS]

Options:

  • --run-id TEXT: Run id
  • --output PATH: Output directory [default: /tmp]
  • --help: Show this message and exit.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

benchllama-0.2.7.tar.gz (13.0 kB view details)

Uploaded Source

Built Distribution

benchllama-0.2.7-py3-none-any.whl (17.7 kB view details)

Uploaded Python 3

File details

Details for the file benchllama-0.2.7.tar.gz.

File metadata

  • Download URL: benchllama-0.2.7.tar.gz
  • Upload date:
  • Size: 13.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.1 CPython/3.10.11 Darwin/23.1.0

File hashes

Hashes for benchllama-0.2.7.tar.gz
Algorithm Hash digest
SHA256 5f574b44b89d3e4b18df6a0ad35afdeb6bfce7fc04250ed9f45695e5dc1e9399
MD5 888a478aab1fef66d3dcc4e0b33a592f
BLAKE2b-256 4c9be69ef31ec62dcaac80d12908cdc3b92fe7e2b9aae400f4cffab37901f6e7

See more details on using hashes here.

File details

Details for the file benchllama-0.2.7-py3-none-any.whl.

File metadata

  • Download URL: benchllama-0.2.7-py3-none-any.whl
  • Upload date:
  • Size: 17.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.1 CPython/3.10.11 Darwin/23.1.0

File hashes

Hashes for benchllama-0.2.7-py3-none-any.whl
Algorithm Hash digest
SHA256 d0d02b5237cea6e42f1e2555695bde28bd7085e7ac0205a261ce93fbbacf7a3b
MD5 525acaecb3dce9ed9028dfadb966331c
BLAKE2b-256 977b57f6ccff51bf691cf684361d6570a32c88d8e19cf1cf1ccc9e4f89537529

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page