Skip to main content

Benchmark any LLM on any hardware. CLI for the llm-speed.com flywheel.

Project description

llm-speed-web

Source for llm-speed.com — the canonical, crowdsourced source of truth for how fast LLMs actually run, across hosted APIs, consumer GPUs, and prosumer rigs.

Install

One-liners

  • pipx (recommended for local-LLM users): pipx install llm-speed
  • uv: uv tool install llm-speed
  • Homebrew: brew install llm-speed/tap/llm-speed (coming soon)
  • Docker: docker run --rm -it llmspeed/llm-speed bench (coming soon)
  • npm: npm install -g llm-speed (coming soon)
  • Standalone binary: download from Releases (coming soon)

Optional backends

  • MLX (Apple Silicon): pip install 'llm-speed[mlx]'
  • vLLM (NVIDIA): pip install 'llm-speed[vllm]'
  • ExLlamaV2 (NVIDIA): pip install 'llm-speed[exllamav2]'
  • llama.cpp + ollama are detected as binaries on PATH; no extras needed.

See docs/RELEASE.md for the publishing runbook (release steps + rollback).

Status

Phase 0 (seeding). The CLI and website come next; see docs/ for the brief and plan.

Layout

docs/                Strategic & design documents
  BRIEF.md           Project brief — why, who, monetization, phases
  CLI.md             llm-speed CLI requirements + portability strategy
  DATA_SOURCES.md    Folklore source inventory + scrape policy
  MARKETING.md       CLI launch & flywheel marketing strategy

db/
  schema.sql         SQLite seed schema (mirrors the eventual Postgres prod schema)
  seed.sqlite        Created on first run

seed/                The seeding pipeline (Python)
  models.py            Plain dataclasses (RawDocument, Claim)
  db.py                Connection + insert helpers
  extractors.py        Regex-based (model × hardware × backend × tok/s) extractor
  reddit/client.py     Reddit (PRAW) — r/LocalLLaMA + neighbors, full sweep
  scrapers/hn.py             Hacker News (Algolia API)
  scrapers/openrouter.py     OpenRouter API
  scrapers/localscore.py     LocalScore HTML + Next.js data
  scrapers/artificial_analysis.py   AA public pages (cross-reference)
  scrapers/github.py         GitHub issues/PRs in inference backends
  scrapers/blogs.py          Curated blog list
  run.py               Top-level orchestrator

Quick start

# 1. Install
python -m venv .venv && source .venv/bin/activate
pip install -e .                       # uses pyproject.toml

# 2. Set credentials (only what you have available; missing ones get skipped)
export REDDIT_CLIENT_ID=...
export REDDIT_CLIENT_SECRET=...
export REDDIT_USER_AGENT="llm-speed-seeder/0.1 (+https://llm-speed.com)"
# Reddit's API rules want a description + contact info. The project URL is
# the right contact for a project-operated bot — DO NOT put `by u/<handle>`
# here, since Reddit logs the user-agent on every request and that ties
# every scrape back to a personal handle.
export GITHUB_TOKEN=...                # optional but strongly recommended

# 3. Smoke test
python -m seed.run --quick --only hn artificial_analysis blogs

# 4. Full sweep
python -m seed.run

# 5. Inspect what landed
python -m seed.run --stats
sqlite3 db/seed.sqlite \
  'SELECT model_family, hardware_name, backend, AVG(decode_tps), COUNT(*)
   FROM claims
   WHERE confidence > 0.5
   GROUP BY 1,2,3 ORDER BY 5 DESC LIMIT 30;'

Per-seeder usage

Each module is also runnable on its own:

python -m seed.reddit.client --quick                      # auth smoke test
python -m seed.scrapers.hn --query "Qwen3 Coder tok/s"
python -m seed.scrapers.openrouter --no-endpoints
python -m seed.scrapers.localscore --max-tests 50
python -m seed.scrapers.github --repo ggerganov/llama.cpp
python -m seed.scrapers.blogs --urls-file my_urls.txt
python -m seed.scrapers.artificial_analysis

Local CI (no GitHub Actions)

This project does its CI on the developer's machine — the lint / test / smoke / API-roundtrip / build pipeline that used to run on every push to GitHub Actions now lives at scripts/check.sh.

# Full check (~30s — lint, test, smoke, API roundtrip, build)
./scripts/check.sh

# Inner-loop fast pass (~10s — lint + test only)
./scripts/check.sh --quick

# Skip individual phases:
SKIP_BUILD=1 ./scripts/check.sh
SKIP_LINT=1 SKIP_BUILD=1 ./scripts/check.sh

To run it automatically on every git push (skippable per-push with --no-verify), opt in to the bundled hook once per clone:

git config core.hooksPath .githooks

The pre-push hook runs --quick by default; CHECK_FULL=1 git push runs the full check.

The previous .github/workflows/ci.yml was deleted; only manual / release-event workflows remain (release.yml, daily-metrics.yml, scheduled-seed.yml, reddit-poster.yml). None of them fire on push.

Design notes

  • Idempotent. Re-running a seeder is safe; documents are uniqued by (source, source_id).
  • Provenance preserved. Every claim links back to its source URL + author + scrape time. Folklore stays distinguishable from CLI-verified canonical results.
  • Confidence-scored. Regex extraction caps at ~0.85; structured-API extraction reaches ~0.9; nothing in the seed phase counts as canonical (that's the CLI's job).
  • Polite. All scrapers honor source-appropriate rate limits and identify themselves in User-Agent.
  • No LLM calls in this phase. Heuristic extraction only. An LLM second-pass for ambiguous cases is a Phase 1 add-on (see docs/CLI.md).

Next milestone

docs/CLI.md — design for the llm-speed benchmark CLI. The seeded folklore is inventory; the CLI is the actual data-quality moat. Ship CLI in 2–4 weeks; if adoption fails the kill criterion (see docs/MARKETING.md), stop before building the website.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_speed-0.0.3.tar.gz (113.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_speed-0.0.3-py3-none-any.whl (134.7 kB view details)

Uploaded Python 3

File details

Details for the file llm_speed-0.0.3.tar.gz.

File metadata

  • Download URL: llm_speed-0.0.3.tar.gz
  • Upload date:
  • Size: 113.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for llm_speed-0.0.3.tar.gz
Algorithm Hash digest
SHA256 2712c95e39a6196c30150dbfeee7ffe7aa0e3b2073791089ca90c1fbebd200c5
MD5 dfe7b0ebcf0c1dc2ee96773f4929a65b
BLAKE2b-256 44627de407ec7761bb82b977ea197adfa07654e89a2cb6107b357ae47d81c2a2

See more details on using hashes here.

Provenance

The following attestation bundles were made for llm_speed-0.0.3.tar.gz:

Publisher: release-pypi.yml on meadow-kun/llm-speed-web

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file llm_speed-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: llm_speed-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 134.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for llm_speed-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 b9de38b1cdd778fd483c05f97a27dca916159cfa7838f9fcd19113f93d05b314
MD5 c3efa706463b789b62475c89dd4955bc
BLAKE2b-256 723811bad77378dc32ef0340289daced4a4b258f0c6b387edc5ff3b39cfa457a

See more details on using hashes here.

Provenance

The following attestation bundles were made for llm_speed-0.0.3-py3-none-any.whl:

Publisher: release-pypi.yml on meadow-kun/llm-speed-web

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page