Skip to main content

A benchmarking tool for Local LLMs.

Project description

localbench

localbench Banner

๐Ÿšง EARLY DEVELOPMENT WARNING ๐Ÿšง

This tool is currently about as stable as a house of cards in a wind tunnel.
Very early alpha. Bugs aren't just expected - they've signed a lease.

Status: Proceed with optimism โ˜•

A benchmarking tool for Local LLMs. Currently keeping an eye on Cortex.cpp but with plans to judge other frameworks equally in the future.

What is this?

localbench measures performance metrics, resource utilization, and stability characteristics of your LLM deployments. Rather comprehensive, really.

Features

  • Model initialization metrics
  • Runtime performance
  • Resource utilization
  • Advanced processing scenarios
  • Workload-specific benchmarks
  • System integration metrics
  • Stability analysis

Installation

Using uvx:

uvx install localbench

Using pip:

pip install localbench

Usage

Basic Benchmarking

# Standard benchmark
localbench "llama3.2:3b-gguf-q2-k"

# With detailed metrics
localbench "llama3.2:3b-gguf-q2-k" --verbose

Specific Benchmarks

# Initialization only
localbench "llama3.2:3b-gguf-q2-k" --type init

# Runtime metrics
localbench "llama3.2:3b-gguf-q2-k" --type runtime

# Long-running stability test
localbench "llama3.2:3b-gguf-q2-k" --type stability --stability-duration 24

Advanced Usage

# Custom benchmark prompts
localbench "llama3.2:3b-gguf-q2-k" --type workload --prompts my_prompts.json

# Multi-model benchmarking
localbench "llama3.2:3b-gguf-q2-k" --type advanced \
    --secondary-models "tinyllama:1b-gguf-q4" "phi2:3b-gguf-q4"

# Export results
localbench "llama3.2:3b-gguf-q2-k" --json results.json

Status

Under active development. Support for additional frameworks is planned.

Roadmap

  • Framework-agnostic benchmarking
  • Additional performance metrics
  • Enhanced visualizations
  • Extended stability testing
  • local server

Development

Setup

  1. Clone the repository:
git clone https://github.com/username/localbench.git
cd localbench
  1. Create and activate a virtual environment:
# Using uv (recommended)
uv venv .venv --python 3.12
source .venv/bin/activate
  1. Install development dependencies:
# Install project in editable mode with test dependencies
uv pip install -e ".[test]"

# Install development tools
uv add --dev ruff pytest pytest-cov pytest-asyncio hypothesis

Code Quality

Linting and Formatting

Run Ruff linter:

# Check code
ruff check .

# Auto-fix issues
ruff check --fix .

# Format code
ruff format .

# Check formatting without changes
ruff format --check .

Testing

Run tests:

# All tests
pytest

# With coverage
pytest --cov=localbench --cov-report=html

# Specific test file
pytest src/tests/test_utils.py

# With hypothesis verbose output
pytest -v src/tests/test_utils.py

Pre-commit Checks

Before submitting a PR:

# Format code
ruff format .

# Run linter
ruff check .

# Run tests with coverage
pytest --cov=localbench --cov-report=term-missing

# Show coverage report in browser (optional)
python -m http.server -d htmlcov

Code Style

The project uses:

  • Type hints
  • Some docstrings for public functions and classes

Project Structure

src/
โ”œโ”€โ”€ localbench/
โ”‚   โ”œโ”€โ”€ core/
โ”‚   โ”‚   โ”œโ”€โ”€ initialization.py   # Model initialization metrics
โ”‚   โ”‚   โ”œโ”€โ”€ runtime.py         # Runtime performance metrics
โ”‚   โ”‚   โ”œโ”€โ”€ resources.py       # Resource utilization metrics
โ”‚   โ”‚   โ”œโ”€โ”€ integration.py     # System integration metrics
โ”‚   โ”‚   โ”œโ”€โ”€ workloads.py      # Workload-specific metrics
โ”‚   โ”‚   โ”œโ”€โ”€ stability.py       # Stability metrics
โ”‚   โ”‚   โ””โ”€โ”€ utils.py          # Shared utilities
โ”‚   โ”œโ”€โ”€ cli.py                # Command-line interface
โ”‚   โ””โ”€โ”€ __init__.py
โ””โ”€โ”€ tests/
    โ”œโ”€โ”€ conftest.py           # Shared test fixtures
    โ”œโ”€โ”€ test_initialization.py
    โ”œโ”€โ”€ test_runtime.py
    โ”œโ”€โ”€ test_resources.py
    โ”œโ”€โ”€ test_integration.py
    โ””โ”€โ”€ test_utils.py

Pre-commit Checks

Before submitting a PR:

  1. Run all tests
  2. Check test coverage
  3. Verify type hints with mypy (coming soon)
  4. Ensure docstrings are up to date

Contributing

Issues and pull requests welcome. Do have a look at the existing ones first, though.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

localbench-0.0.2.tar.gz (223.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

localbench-0.0.2-py3-none-any.whl (30.6 kB view details)

Uploaded Python 3

File details

Details for the file localbench-0.0.2.tar.gz.

File metadata

  • Download URL: localbench-0.0.2.tar.gz
  • Upload date:
  • Size: 223.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.4.28

File hashes

Hashes for localbench-0.0.2.tar.gz
Algorithm Hash digest
SHA256 bd8f7f6228d63cf11761520adbb6fd2ec08ef98264f5a643e42726dd22f30a3d
MD5 4c3bce955e30798af8b09cb4443f2683
BLAKE2b-256 f5f3af83d7e07a60f96c283f70810ad33a9ad226e7ec08dd1e3e36d1c73845cc

See more details on using hashes here.

File details

Details for the file localbench-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: localbench-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 30.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.4.28

File hashes

Hashes for localbench-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 14894344be7f9d5e8744a71cdbe0d90a10513b5381172d54c5e218dc45342c61
MD5 21e5d6ba2667aa183260eb103a933106
BLAKE2b-256 0a01d536e0bced1bb965337f580f875d180275a58dfbfc0c14e2a64106b20bc9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page