Skip to main content

CLI tool for measuring and comparing LLM inference speeds

Project description

⚡ tacho - LLM Speed Test

A fast CLI tool for benchmarking LLM inference speed across multiple models and providers. Get tokens/second metrics to compare model performance.

Quick Start

Set up your API keys:

export OPENAI_API_KEY=<your-key-here>
export GEMINI_API_KEY=<your-key-here>

Run a benchmark (requires uv):

uvx tacho gpt-4.1 gemini/gemini-2.5-pro vertex_ai/claude-sonnet-4@20250514
✓ gpt-4.1
✓ vertex_ai/claude-sonnet-4@20250514
✓ gemini/gemini-2.5-pro
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━┳━━━━━━━━┓
┃ Model                               Avg t/s  Min t/s  Max t/s   Time  Tokens ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━╇━━━━━━━━┩
│ gemini/gemini-2.5-pro                  80.0     56.7    128.4  13.5s     998 │
│ vertex_ai/claude-sonnet-4@20250514     48.9     44.9     51.6  10.2s     500 │
│ gpt-4.1                                41.5     35.1     49.9  12.3s     500 │
└────────────────────────────────────┴─────────┴─────────┴─────────┴───────┴────────┘

With its default settings, tacho generates 5 runs of 500 tokens each per model producing some inference costs.

Features

  • Parallel benchmarking - All models and runs execute concurrently for faster results
  • Token-based metrics - Measures actual tokens/second, not just response time
  • Multi-provider support - Works with any provider supported by LiteLLM (OpenAI, Anthropic, Google, Cohere, etc.)
  • Configurable token limits - Control response length for consistent comparisons
  • Pre-flight validation - Checks model availability and authentication before benchmarking
  • Graceful error handling - Clear error messages for authentication, rate limits, and connection issues

Installation

For regular use, install with uv:

uv tool install tacho

Or with pip:

pip install tacho

Usage

Basic benchmark

# Compare models with default settings (5 runs, 500 token limit)
tacho gpt-4.1-nano gemini/gemini-2.0-flash

# Custom settings (options must come before model names)
tacho --runs 3 --tokens 1000 gpt-4.1-nano gemini/gemini-2.0-flash
tacho -r 3 -t 1000 gpt-4.1-nano gemini/gemini-2.0-flash

Command options

  • --runs, -r: Number of inference runs per model (default: 5)
  • --tokens, -t: Maximum tokens to generate per response (default: 500)
  • --prompt, -p: Custom prompt for benchmarking

Note: When using the shorthand syntax (without the bench subcommand), options must be placed before model names. For example:

  • tacho -t 2000 gpt-4.1-mini
  • tacho gpt-4.1-mini -t 2000

Output

Tacho displays a clean comparison table showing:

  • Avg/Min/Max tokens per second - Primary performance metrics
  • Average time - Average time per inference run

Models are sorted by performance (highest tokens/second first).

Supported Providers

Tacho works with any provider supported by LiteLLM.

Development

To contribute to Tacho, clone the repository and install development dependencies:

git clone https://github.com/pietz/tacho.git
cd tacho
uv sync

Running Tests

Tacho includes a comprehensive test suite with full mocking of external API calls:

# Run all tests
uv run pytest

# Run with verbose output
uv run pytest -v

# Run specific test file
uv run pytest tests/test_cli.py

# Run with coverage
uv run pytest --cov=tacho

The test suite includes:

  • Unit tests for all core modules
  • Mocked LiteLLM API calls (no API keys required)
  • CLI command testing
  • Async function testing
  • Edge case coverage

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tacho-0.8.7.tar.gz (166.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tacho-0.8.7-py3-none-any.whl (7.5 kB view details)

Uploaded Python 3

File details

Details for the file tacho-0.8.7.tar.gz.

File metadata

  • Download URL: tacho-0.8.7.tar.gz
  • Upload date:
  • Size: 166.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.18 {"installer":{"name":"uv","version":"0.9.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for tacho-0.8.7.tar.gz
Algorithm Hash digest
SHA256 989ddb660d7ad08c4ca09ef07df8130fd4759d293bf5cd5c43102ebebecd8f57
MD5 82b402ba10258a57372716d21fa161e7
BLAKE2b-256 367726674d716efb26ebe27efb9b435f04f24fbeb997e5ef90bebbd2525cae97

See more details on using hashes here.

File details

Details for the file tacho-0.8.7-py3-none-any.whl.

File metadata

  • Download URL: tacho-0.8.7-py3-none-any.whl
  • Upload date:
  • Size: 7.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.18 {"installer":{"name":"uv","version":"0.9.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for tacho-0.8.7-py3-none-any.whl
Algorithm Hash digest
SHA256 51f9e9a48012c6cc086ad59903d7d0120d5a4c06a31c9b7a6d791d9f8e1bdd5f
MD5 2d195adf891fc0a9a43d6b92dd5e9945
BLAKE2b-256 e7fca9ae177a21ac310c7bca95b14eb472d8fac030ecbe2be78fdf2109c79478

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page