Find the best LLM that runs on your hardware
Project description
whichllm
Find the best local LLM that actually runs on your hardware.
Auto-detects your GPU/CPU/RAM and ranks the top models from HuggingFace that fit your system.
Why whichllm?
One command. Real answers. No TUI to learn, no keybindings to memorize.
| whichllm | Others (TUI-based) | |
|---|---|---|
| Getting results | whichllm — done |
Launch TUI → navigate → search → filter |
| Model data | Live from HuggingFace API | Static built-in database |
| Benchmarks | Real eval scores with confidence | Fixed quality scores |
| Scriptable | whichllm --json | jq |
Requires special flags |
| Learning curve | Zero | Vim keybindings required |
Features
- Auto-detect hardware — NVIDIA, AMD, Apple Silicon, CPU-only
- Smart ranking — Scores models by VRAM fit, speed, and benchmark quality
- Live data — Fetches models directly from HuggingFace (cached for performance)
- Benchmark-aware — Integrates real eval scores with confidence-based dampening
- Task profiles — Filter by general, coding, vision, or math use cases
- GPU simulation — Test with any GPU:
whichllm --gpu "RTX 4090" - Hardware planning — Reverse lookup:
whichllm plan "llama 3 70b" - JSON output — Pipe-friendly:
whichllm --json
Install
pipx (recommended)
pipx install whichllm
Homebrew
brew tap Andyyyy64/whichllm
brew install whichllm
pip
pip install whichllm
Development
git clone https://github.com/Andyyyy64/whichllm.git
cd whichllm
uv sync --dev
uv run whichllm
Usage
# Auto-detect hardware and show best models
whichllm
# Simulate a GPU (e.g. planning a purchase)
whichllm --gpu "RTX 4090"
whichllm --gpu "RTX 5090"
# CPU-only mode
whichllm --cpu-only
# More results / filters
whichllm --top 20
whichllm --quant Q4_K_M
whichllm --min-speed 30
whichllm --evidence base # allow id/base-model matches
whichllm --evidence strict # id-exact only (same as --direct)
whichllm --direct
# JSON output
whichllm --json
# Force refresh (ignore cache)
whichllm --refresh
# Show hardware info only
whichllm hardware
# Plan: what GPU do I need for a specific model?
whichllm plan "llama 3 70b"
whichllm plan "Qwen2.5-72B" --quant Q8_0
whichllm plan "mistral 7b" --context-length 32768
Integrations
Ollama
Find the best model and run it directly:
# Pick the top model and run it with Ollama
whichllm --top 1 --json | jq -r '.models[0].model_id' | xargs ollama run
# Find the best coding model
whichllm --profile coding --top 1 --json | jq -r '.models[0].model_id' | xargs ollama run
Shell alias
Add to your .bashrc / .zshrc:
alias bestllm='whichllm --top 1 --json | jq -r ".models[0].model_id"'
# Usage: ollama run $(bestllm)
Scoring
Each model gets a score from 0 to 100.
| Factor | Points | Description |
|---|---|---|
| Model size | 0-40 | Larger models generally produce better output |
| Benchmark | 0-10 | Arena ELO / Open LLM Leaderboard scores |
| Speed | 0-20 | Higher tok/s = more practical to use |
| Source trust | -5 to +5 | Official repos get a bonus, repackagers get a penalty |
| Popularity | 0-3 | Downloads and likes as tiebreaker |
Score markers:
~(yellow) — No direct benchmark yet. Score estimated from the model family?(yellow) — No benchmark data available
How it works
Data pipeline
- Fetches ~900 popular models from HuggingFace API (text-generation, GGUF, multimodal)
- Fetches benchmark scores from Chatbot Arena ELO and Open LLM Leaderboard, normalized to 0-100
- All data cached for 24 hours at
~/.cache/whichllm/
Ranking engine
- Hardware detection — GPU (NVIDIA/AMD/Apple Silicon), CPU, RAM, disk
- VRAM estimation — model size + quantization + KV cache overhead
- Compatibility check — Full GPU / Partial Offload / CPU-only classification
- Speed estimation — tok/s based on GPU memory bandwidth
- Scoring — combines size, benchmark, speed, source trust, and popularity
- Deduplication — merges GGUF variants and version differences into model families
Project structure
src/whichllm/
├── cli.py # Typer CLI entry point
├── constants.py # GPU bandwidth tables, quantization constants
├── hardware/ # Hardware detection (NVIDIA, AMD, Apple, CPU, RAM)
│ └── gpu_simulator.py # GPU simulation for --gpu flag
├── models/
│ ├── fetcher.py # HuggingFace API model fetcher
│ ├── benchmark.py # Benchmark scores (Arena + Leaderboard)
│ ├── grouper.py # Model family grouping and dedup
│ └── cache.py # JSON cache
├── engine/
│ ├── vram.py # VRAM requirement estimation
│ ├── compatibility.py # Hardware compatibility check
│ ├── performance.py # Inference speed estimation
│ └── ranker.py # Scoring and ranking
└── output/
└── display.py # Rich table output
Contributing
Contributions are welcome! See CONTRIBUTING.md for guidelines.
Requirements
- Python 3.11+
- NVIDIA GPU detection via
nvidia-ml-py(included by default) - AMD / Apple Silicon detected automatically
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file whichllm-0.4.0.tar.gz.
File metadata
- Download URL: whichllm-0.4.0.tar.gz
- Upload date:
- Size: 206.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fe1e44ae9d78d52adc305376e402c4ebfc60d635e2dd24cb48d154dae5c5f108
|
|
| MD5 |
e681f4569f37c54968bc39cdf72dfa00
|
|
| BLAKE2b-256 |
790fb82e66826164fb55a270125bdf588ea1de5d9269194bc6d5d634d9e085b8
|
File details
Details for the file whichllm-0.4.0-py3-none-any.whl.
File metadata
- Download URL: whichllm-0.4.0-py3-none-any.whl
- Upload date:
- Size: 51.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f8a6988203bf8c019b8cd60a8003556a8f5036eddf62b587864f234e1b0583dc
|
|
| MD5 |
36648f20257b5590a9fba65c1f4d82c6
|
|
| BLAKE2b-256 |
c1c30f3556fada6f47a70a02f99db9fa3a821eccda381c76dde33d28710a2416
|