Skip to main content

A benchmarking tool for AI models and Hardware.

Project description

robobench

robobench Banner

๐Ÿšง EARLY DEVELOPMENT WARNING ๐Ÿšง

This tool is currently about as stable as a house of cards in a wind tunnel.
Very early alpha. Bugs aren't just expected - they've signed a lease.

Status: Proceed with optimism โ˜•

A benchmarking tool for Local LLMs. Currently keeping an eye on Cortex.cpp but with plans to judge other frameworks equally in the future.

What is this?

robobench measures performance metrics, resource utilization, and stability characteristics of your LLM deployments. Rather comprehensive, really.

Features

  • Model initialization metrics
  • Runtime performance
  • Resource utilization
  • Advanced processing scenarios
  • Workload-specific benchmarks
  • System integration metrics
  • Stability analysis

Installation

Using uvx:

uvx install robobench

Using pip:

pip install robobench

Usage

Basic Benchmarking

# Standard benchmark
robobench "llama3.2:3b-gguf-q2-k"

# With detailed metrics
robobench "llama3.2:3b-gguf-q2-k" --verbose

Specific Benchmarks

# Initialization only
robobench "llama3.2:3b-gguf-q2-k" --type init

# Runtime metrics
robobench "llama3.2:3b-gguf-q2-k" --type runtime

# Long-running stability test
robobench "llama3.2:3b-gguf-q2-k" --type stability --stability-duration 24

Advanced Usage

# Custom benchmark prompts
robobench "llama3.2:3b-gguf-q2-k" --type workload --prompts my_prompts.json

# Multi-model benchmarking
robobench "llama3.2:3b-gguf-q2-k" --type advanced \
    --secondary-models "tinyllama:1b-gguf-q4" "phi2:3b-gguf-q4"

# Export results
robobench "llama3.2:3b-gguf-q2-k" --json results.json

Status

Under active development. Support for additional frameworks is planned.

Roadmap

  • Framework-agnostic benchmarking
  • Additional performance metrics
  • Enhanced visualizations
  • Extended stability testing
  • local server and UI
  • CI/CD management

Development

Setup

  1. Clone the repository:
git clone https://github.com/jan.ai/robobench.git
cd robobench
  1. Create and activate a virtual environment:
# Using uv (recommended)
uv venv .venv --python 3.12
source .venv/bin/activate
  1. Install development dependencies:
# Install project in editable mode with test dependencies
uv pip install -e ".[test]"

# Install development tools
uv add --dev ruff pytest pytest-cov pytest-asyncio hypothesis

Code Quality

Linting and Formatting

Run Ruff linter:

# Check code
ruff check .

# Auto-fix issues
ruff check --fix .

# Format code
ruff format .

# Check formatting without changes
ruff format --check .

Testing

Run tests:

# All tests
pytest

# With coverage
pytest --cov=robobench --cov-report=html

# Specific test file
pytest src/tests/test_utils.py

# With hypothesis verbose output
pytest -v src/tests/test_utils.py

Pre-commit Checks

Before submitting a PR:

# Format code
ruff format .

# Run linter
ruff check .

# Run tests with coverage
pytest --cov=robobench --cov-report=term-missing

# Show coverage report in browser (optional)
python -m http.server -d htmlcov

Code Style

The project uses:

  • Type hints
  • Some docstrings for public functions and classes

Project Structure

src/
โ”œโ”€โ”€ robobench/
โ”‚   โ”œโ”€โ”€ core/
โ”‚   โ”‚   โ”œโ”€โ”€ initialization.py   # Model initialization metrics
โ”‚   โ”‚   โ”œโ”€โ”€ runtime.py         # Runtime performance metrics
โ”‚   โ”‚   โ”œโ”€โ”€ resources.py       # Resource utilization metrics
โ”‚   โ”‚   โ”œโ”€โ”€ integration.py     # System integration metrics
โ”‚   โ”‚   โ”œโ”€โ”€ workloads.py      # Workload-specific metrics
โ”‚   โ”‚   โ”œโ”€โ”€ stability.py       # Stability metrics
โ”‚   โ”‚   โ””โ”€โ”€ utils.py          # Shared utilities
โ”‚   โ”œโ”€โ”€ cli.py                # Command-line interface
โ”‚   โ””โ”€โ”€ __init__.py
โ””โ”€โ”€ tests/
    โ”œโ”€โ”€ conftest.py           # Shared test fixtures
    โ”œโ”€โ”€ test_initialization.py
    โ”œโ”€โ”€ test_runtime.py
    โ”œโ”€โ”€ test_resources.py
    โ”œโ”€โ”€ test_integration.py
    โ””โ”€โ”€ test_utils.py

Pre-commit Checks

Before submitting a PR:

  1. Run all tests
  2. Check test coverage
  3. Verify type hints with mypy (coming soon)
  4. Ensure docstrings are up to date

Contributing

Issues and pull requests welcome. Do have a look at the existing ones first, though.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

robobench-0.0.2.tar.gz (223.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

robobench-0.0.2-py3-none-any.whl (30.6 kB view details)

Uploaded Python 3

File details

Details for the file robobench-0.0.2.tar.gz.

File metadata

  • Download URL: robobench-0.0.2.tar.gz
  • Upload date:
  • Size: 223.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.4.28

File hashes

Hashes for robobench-0.0.2.tar.gz
Algorithm Hash digest
SHA256 fc2f472dddf2f7ccfa75cce64fd12d7576d192ced9fde9b5f0aafa297695a80e
MD5 dfba4875deeffc50ebf7c51db075cf55
BLAKE2b-256 bd13a12a397439222c13d1837200f14e6d1ddec2274173d006426dadbc4a66c6

See more details on using hashes here.

File details

Details for the file robobench-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: robobench-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 30.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.4.28

File hashes

Hashes for robobench-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 5adb4cd5ecde38ad4c42fb5b5c1de2cef00384dc750148122e83a44db9931c1a
MD5 a5902080f21189587d52ccb74f3350ee
BLAKE2b-256 8666dbe4d68577bea0d66cf19707872787007f19564c1d19a7d22bbb05ef182b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page