Skip to main content

Benchmark any LLM on any hardware. CLI for the llm-speed.com flywheel.

Project description

llm-speed-web

Source for llm-speed.com — the canonical, crowdsourced source of truth for how fast LLMs actually run, across hosted APIs, consumer GPUs, and prosumer rigs.

Install

One-liners

  • pipx (recommended for local-LLM users): pipx install llm-speed
  • uv: uv tool install llm-speed
  • Homebrew: brew install llm-speed/tap/llm-speed (coming soon)
  • Docker: docker run --rm -it llmspeed/llm-speed bench (coming soon)
  • npm: npm install -g llm-speed (coming soon)
  • Standalone binary: download from Releases (coming soon)

Optional backends

  • MLX (Apple Silicon): pip install 'llm-speed[mlx]'
  • vLLM (NVIDIA): pip install 'llm-speed[vllm]'
  • ExLlamaV2 (NVIDIA): pip install 'llm-speed[exllamav2]'
  • llama.cpp + ollama are detected as binaries on PATH; no extras needed.

See docs/RELEASE.md for the publishing runbook (release steps + rollback).

Status

Phase 0 (seeding). The CLI and website come next; see docs/ for the brief and plan.

Layout

docs/                Strategic & design documents
  BRIEF.md           Project brief — why, who, monetization, phases
  CLI.md             llm-speed CLI requirements + portability strategy
  DATA_SOURCES.md    Folklore source inventory + scrape policy
  MARKETING.md       CLI launch & flywheel marketing strategy

db/
  schema.sql         SQLite seed schema (mirrors the eventual Postgres prod schema)
  seed.sqlite        Created on first run

seed/                The seeding pipeline (Python)
  models.py            Plain dataclasses (RawDocument, Claim)
  db.py                Connection + insert helpers
  extractors.py        Regex-based (model × hardware × backend × tok/s) extractor
  reddit/client.py     Reddit (PRAW) — r/LocalLLaMA + neighbors, full sweep
  scrapers/hn.py             Hacker News (Algolia API)
  scrapers/openrouter.py     OpenRouter API
  scrapers/localscore.py     LocalScore HTML + Next.js data
  scrapers/artificial_analysis.py   AA public pages (cross-reference)
  scrapers/github.py         GitHub issues/PRs in inference backends
  scrapers/blogs.py          Curated blog list
  run.py               Top-level orchestrator

Quick start

# 1. Install
python -m venv .venv && source .venv/bin/activate
pip install -e .                       # uses pyproject.toml

# 2. Set credentials (only what you have available; missing ones get skipped)
export REDDIT_CLIENT_ID=...
export REDDIT_CLIENT_SECRET=...
export REDDIT_USER_AGENT="llm-speed-seeder/0.1 (+https://llm-speed.com)"
# Reddit's API rules want a description + contact info. The project URL is
# the right contact for a project-operated bot — DO NOT put `by u/<handle>`
# here, since Reddit logs the user-agent on every request and that ties
# every scrape back to a personal handle.
export GITHUB_TOKEN=...                # optional but strongly recommended

# 3. Smoke test
python -m seed.run --quick --only hn artificial_analysis blogs

# 4. Full sweep
python -m seed.run

# 5. Inspect what landed
python -m seed.run --stats
sqlite3 db/seed.sqlite \
  'SELECT model_family, hardware_name, backend, AVG(decode_tps), COUNT(*)
   FROM claims
   WHERE confidence > 0.5
   GROUP BY 1,2,3 ORDER BY 5 DESC LIMIT 30;'

Per-seeder usage

Each module is also runnable on its own:

python -m seed.reddit.client --quick                      # auth smoke test
python -m seed.scrapers.hn --query "Qwen3 Coder tok/s"
python -m seed.scrapers.openrouter --no-endpoints
python -m seed.scrapers.localscore --max-tests 50
python -m seed.scrapers.github --repo ggerganov/llama.cpp
python -m seed.scrapers.blogs --urls-file my_urls.txt
python -m seed.scrapers.artificial_analysis

Local CI (no GitHub Actions)

This project does its CI on the developer's machine — the lint / test / smoke / API-roundtrip / build pipeline that used to run on every push to GitHub Actions now lives at scripts/check.sh.

# Full check (~30s — lint, test, smoke, API roundtrip, build)
./scripts/check.sh

# Inner-loop fast pass (~10s — lint + test only)
./scripts/check.sh --quick

# Skip individual phases:
SKIP_BUILD=1 ./scripts/check.sh
SKIP_LINT=1 SKIP_BUILD=1 ./scripts/check.sh

To run it automatically on every git push (skippable per-push with --no-verify), opt in to the bundled hook once per clone:

git config core.hooksPath .githooks

The pre-push hook runs --quick by default; CHECK_FULL=1 git push runs the full check.

The previous .github/workflows/ci.yml was deleted; only manual / release-event workflows remain (release.yml, daily-metrics.yml, scheduled-seed.yml, reddit-poster.yml). None of them fire on push.

Design notes

  • Idempotent. Re-running a seeder is safe; documents are uniqued by (source, source_id).
  • Provenance preserved. Every claim links back to its source URL + author + scrape time. Folklore stays distinguishable from CLI-verified canonical results.
  • Confidence-scored. Regex extraction caps at ~0.85; structured-API extraction reaches ~0.9; nothing in the seed phase counts as canonical (that's the CLI's job).
  • Polite. All scrapers honor source-appropriate rate limits and identify themselves in User-Agent.
  • No LLM calls in this phase. Heuristic extraction only. An LLM second-pass for ambiguous cases is a Phase 1 add-on (see docs/CLI.md).

Next milestone

docs/CLI.md — design for the llm-speed benchmark CLI. The seeded folklore is inventory; the CLI is the actual data-quality moat. Ship CLI in 2–4 weeks; if adoption fails the kill criterion (see docs/MARKETING.md), stop before building the website.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_speed-0.0.2.tar.gz (113.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_speed-0.0.2-py3-none-any.whl (134.3 kB view details)

Uploaded Python 3

File details

Details for the file llm_speed-0.0.2.tar.gz.

File metadata

  • Download URL: llm_speed-0.0.2.tar.gz
  • Upload date:
  • Size: 113.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for llm_speed-0.0.2.tar.gz
Algorithm Hash digest
SHA256 083c1abcc96ae5af331df7a01bd2a3b49da489eb5158d1de90008f1390312538
MD5 51a00e7b133274e57c2e2b3d5d0e60e5
BLAKE2b-256 8a6eeb23afdcba9ecd4c2c64bcac1ae2e21f3e900869710f314fc2328996d720

See more details on using hashes here.

Provenance

The following attestation bundles were made for llm_speed-0.0.2.tar.gz:

Publisher: release-pypi.yml on meadow-kun/llm-speed-web

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file llm_speed-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: llm_speed-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 134.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for llm_speed-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 842194d956847ff5c5dc9b4da5ec443f03f7527ef555a50b120915822f711f26
MD5 e42aff382e6da95670b9bd780fd632e2
BLAKE2b-256 1065f5ecc9a146efe28a69f2024f26cf1217271a0dc272291faf0c3ff316cdde

See more details on using hashes here.

Provenance

The following attestation bundles were made for llm_speed-0.0.2-py3-none-any.whl:

Publisher: release-pypi.yml on meadow-kun/llm-speed-web

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page