Skip to main content

Find the best LLM that runs on your hardware

Project description

whichllm

PyPI version Python 3.11+ License: MIT Tests

Find the best local LLM that actually runs on your hardware.

Auto-detects your GPU/CPU/RAM and ranks the top models from HuggingFace that fit your system.

日本語版はこちら

demo

Why whichllm?

One command. Real answers. No TUI to learn, no keybindings to memorize.

whichllm Others (TUI-based)
Getting results whichllm — done Launch TUI → navigate → search → filter
Model data Live from HuggingFace API Static built-in database
Benchmarks Real eval scores with confidence Fixed quality scores
Scriptable whichllm --json | jq Requires special flags
Learning curve Zero Vim keybindings required

Features

  • Auto-detect hardware — NVIDIA, AMD, Apple Silicon, CPU-only
  • Smart ranking — Scores models by VRAM fit, speed, and benchmark quality
  • Live data — Fetches models directly from HuggingFace (cached for performance)
  • Benchmark-aware — Integrates real eval scores with confidence-based dampening
  • Task profiles — Filter by general, coding, vision, or math use cases
  • GPU simulation — Test with any GPU: whichllm --gpu "RTX 4090"
  • Hardware planning — Reverse lookup: whichllm plan "llama 3 70b"
  • JSON output — Pipe-friendly: whichllm --json

Install

pipx (recommended)

pipx install whichllm

Homebrew

brew tap Andyyyy64/whichllm
brew install whichllm

pip

pip install whichllm

Development

git clone https://github.com/Andyyyy64/whichllm.git
cd whichllm
uv sync --dev
uv run whichllm

Usage

# Auto-detect hardware and show best models
whichllm

# Simulate a GPU (e.g. planning a purchase)
whichllm --gpu "RTX 4090"
whichllm --gpu "RTX 5090"

# CPU-only mode
whichllm --cpu-only

# More results / filters
whichllm --top 20
whichllm --quant Q4_K_M
whichllm --min-speed 30
whichllm --evidence base   # allow id/base-model matches
whichllm --evidence strict # id-exact only (same as --direct)
whichllm --direct

# JSON output
whichllm --json

# Force refresh (ignore cache)
whichllm --refresh

# Show hardware info only
whichllm hardware

# Plan: what GPU do I need for a specific model?
whichllm plan "llama 3 70b"
whichllm plan "Qwen2.5-72B" --quant Q8_0
whichllm plan "mistral 7b" --context-length 32768

Integrations

Ollama

Find the best model and run it directly:

# Pick the top model and run it with Ollama
whichllm --top 1 --json | jq -r '.models[0].model_id' | xargs ollama run

# Find the best coding model
whichllm --profile coding --top 1 --json | jq -r '.models[0].model_id' | xargs ollama run

Shell alias

Add to your .bashrc / .zshrc:

alias bestllm='whichllm --top 1 --json | jq -r ".models[0].model_id"'
# Usage: ollama run $(bestllm)

Scoring

Each model gets a score from 0 to 100.

Factor Points Description
Model size 0-40 Larger models generally produce better output
Benchmark 0-10 Arena ELO / Open LLM Leaderboard scores
Speed 0-20 Higher tok/s = more practical to use
Source trust -5 to +5 Official repos get a bonus, repackagers get a penalty
Popularity 0-3 Downloads and likes as tiebreaker

Score markers:

  • ~ (yellow) — No direct benchmark yet. Score estimated from the model family
  • ? (yellow) — No benchmark data available

How it works

Data pipeline

  1. Fetches ~900 popular models from HuggingFace API (text-generation, GGUF, multimodal)
  2. Fetches benchmark scores from Chatbot Arena ELO and Open LLM Leaderboard, normalized to 0-100
  3. All data cached for 24 hours at ~/.cache/whichllm/

Ranking engine

  1. Hardware detection — GPU (NVIDIA/AMD/Apple Silicon), CPU, RAM, disk
  2. VRAM estimation — model size + quantization + KV cache overhead
  3. Compatibility check — Full GPU / Partial Offload / CPU-only classification
  4. Speed estimation — tok/s based on GPU memory bandwidth
  5. Scoring — combines size, benchmark, speed, source trust, and popularity
  6. Deduplication — merges GGUF variants and version differences into model families

Project structure

src/whichllm/
├── cli.py              # Typer CLI entry point
├── constants.py        # GPU bandwidth tables, quantization constants
├── hardware/           # Hardware detection (NVIDIA, AMD, Apple, CPU, RAM)
│   └── gpu_simulator.py  # GPU simulation for --gpu flag
├── models/
│   ├── fetcher.py      # HuggingFace API model fetcher
│   ├── benchmark.py    # Benchmark scores (Arena + Leaderboard)
│   ├── grouper.py      # Model family grouping and dedup
│   └── cache.py        # JSON cache
├── engine/
│   ├── vram.py         # VRAM requirement estimation
│   ├── compatibility.py # Hardware compatibility check
│   ├── performance.py  # Inference speed estimation
│   └── ranker.py       # Scoring and ranking
└── output/
    └── display.py      # Rich table output

Contributing

Contributions are welcome! See CONTRIBUTING.md for guidelines.

Requirements

  • Python 3.11+
  • NVIDIA GPU detection via nvidia-ml-py (included by default)
  • AMD / Apple Silicon detected automatically

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

whichllm-0.4.0.tar.gz (206.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

whichllm-0.4.0-py3-none-any.whl (51.0 kB view details)

Uploaded Python 3

File details

Details for the file whichllm-0.4.0.tar.gz.

File metadata

  • Download URL: whichllm-0.4.0.tar.gz
  • Upload date:
  • Size: 206.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for whichllm-0.4.0.tar.gz
Algorithm Hash digest
SHA256 fe1e44ae9d78d52adc305376e402c4ebfc60d635e2dd24cb48d154dae5c5f108
MD5 e681f4569f37c54968bc39cdf72dfa00
BLAKE2b-256 790fb82e66826164fb55a270125bdf588ea1de5d9269194bc6d5d634d9e085b8

See more details on using hashes here.

File details

Details for the file whichllm-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: whichllm-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 51.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for whichllm-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f8a6988203bf8c019b8cd60a8003556a8f5036eddf62b587864f234e1b0583dc
MD5 36648f20257b5590a9fba65c1f4d82c6
BLAKE2b-256 c1c30f3556fada6f47a70a02f99db9fa3a821eccda381c76dde33d28710a2416

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page