Find the best LLM that runs on your hardware

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

andyyyy64

These details have not been verified by PyPI

Project description

whichllm

Find the best local LLM that actually runs on your hardware.

Auto-detects your GPU/CPU/RAM and ranks the top models from HuggingFace that fit your system.

日本語版はこちら

demo

Why whichllm?

One command. Real answers. No TUI to learn, no keybindings to memorize.

	whichllm	Others (TUI-based)
Getting results	`whichllm` — done	Launch TUI → navigate → search → filter
Model data	Live from HuggingFace API	Static built-in database
Benchmarks	Real eval scores with confidence	Fixed quality scores
Scriptable	`whichllm --json \| jq`	Requires special flags
Learning curve	Zero	Vim keybindings required

Features

Auto-detect hardware — NVIDIA, AMD, Apple Silicon, CPU-only
Smart ranking — Scores models by VRAM fit, speed, and benchmark quality
One-command chat — whichllm run downloads and starts a chat session instantly
Code snippets — whichllm snippet prints ready-to-run Python for any model
Live data — Fetches models directly from HuggingFace (cached for performance)
Benchmark-aware — Integrates real eval scores with confidence-based dampening
Task profiles — Filter by general, coding, vision, or math use cases
GPU simulation — Test with any GPU: whichllm --gpu "RTX 4090"
Hardware planning — Reverse lookup: whichllm plan "llama 3 70b"
JSON output — Pipe-friendly: whichllm --json

Run & Snippet

Try any model with a single command. No manual installs needed — whichllm creates an isolated environment via uv, installs dependencies, downloads the model, and starts an interactive chat.

run demo

# Chat with a model (auto-picks the best GGUF variant)
whichllm run "qwen 2.5 1.5b gguf"

# Auto-pick the best model for your hardware and chat
whichllm run

# CPU-only mode
whichllm run "phi 3 mini gguf" --cpu-only

Works with all model formats:

GGUF — via llama-cpp-python (lightweight, fast)
AWQ / GPTQ — via transformers + autoawq / auto-gptq
FP16 / BF16 — via transformers

Get a copy-paste Python snippet instead:

whichllm snippet "qwen 7b"

from llama_cpp import Llama

llm = Llama.from_pretrained(
    repo_id="Qwen/Qwen2.5-7B-Instruct-GGUF",
    filename="qwen2.5-7b-instruct-q4_k_m.gguf",
    n_ctx=4096,
    n_gpu_layers=-1,
    verbose=False,
)

output = llm.create_chat_completion(
    messages=[{"role": "user", "content": "Hello!"}],
)
print(output["choices"][0]["message"]["content"])

Install

pipx (recommended)

pipx install whichllm

Homebrew

brew tap Andyyyy64/whichllm
brew install whichllm

pip

pip install whichllm

Development

git clone https://github.com/Andyyyy64/whichllm.git
cd whichllm
uv sync --dev
uv run whichllm

Usage

# Auto-detect hardware and show best models
whichllm

# Simulate a GPU (e.g. planning a purchase)
whichllm --gpu "RTX 4090"
whichllm --gpu "RTX 5090"

# CPU-only mode
whichllm --cpu-only

# More results / filters
whichllm --top 20
whichllm --quant Q4_K_M
whichllm --min-speed 30
whichllm --evidence base   # allow id/base-model matches
whichllm --evidence strict # id-exact only (same as --direct)
whichllm --direct

# JSON output
whichllm --json

# Force refresh (ignore cache)
whichllm --refresh

# Show hardware info only
whichllm hardware

# Plan: what GPU do I need for a specific model?
whichllm plan "llama 3 70b"
whichllm plan "Qwen2.5-72B" --quant Q8_0
whichllm plan "mistral 7b" --context-length 32768

# Run: download and chat with a model instantly
whichllm run "qwen 2.5 1.5b gguf"
whichllm run                       # auto-pick best for your hardware

# Snippet: print ready-to-run Python code
whichllm snippet "qwen 7b"
whichllm snippet "llama 3 8b gguf" --quant Q5_K_M

Integrations

Ollama

Find the best model and run it directly:

# Pick the top model and run it with Ollama
whichllm --top 1 --json | jq -r '.models[0].model_id' | xargs ollama run

# Find the best coding model
whichllm --profile coding --top 1 --json | jq -r '.models[0].model_id' | xargs ollama run

Shell alias

Add to your .bashrc / .zshrc:

alias bestllm='whichllm --top 1 --json | jq -r ".models[0].model_id"'
# Usage: ollama run $(bestllm)

Scoring

Each model gets a score from 0 to 100.

Factor	Points	Description
Model size	0-40	Larger models generally produce better output
Benchmark	0-10	Arena ELO / Open LLM Leaderboard scores
Speed	0-20	Higher tok/s = more practical to use
Source trust	-5 to +5	Official repos get a bonus, repackagers get a penalty
Popularity	0-3	Downloads and likes as tiebreaker

Score markers:

~ (yellow) — No direct benchmark yet. Score estimated from the model family
? (yellow) — No benchmark data available

How it works

Data pipeline

Fetches ~900 popular models from HuggingFace API (text-generation, GGUF, multimodal)
Fetches benchmark scores from Chatbot Arena ELO and Open LLM Leaderboard, normalized to 0-100
All data cached for 24 hours at ~/.cache/whichllm/

Ranking engine

Hardware detection — GPU (NVIDIA/AMD/Apple Silicon), CPU, RAM, disk
VRAM estimation — model size + quantization + KV cache overhead
Compatibility check — Full GPU / Partial Offload / CPU-only classification
Speed estimation — tok/s based on GPU memory bandwidth
Scoring — combines size, benchmark, speed, source trust, and popularity
Deduplication — merges GGUF variants and version differences into model families

Project structure

src/whichllm/
├── cli.py              # Typer CLI entry point
├── constants.py        # GPU bandwidth tables, quantization constants
├── hardware/           # Hardware detection (NVIDIA, AMD, Apple, CPU, RAM)
│   └── gpu_simulator.py  # GPU simulation for --gpu flag
├── models/
│   ├── fetcher.py      # HuggingFace API model fetcher
│   ├── benchmark.py    # Benchmark scores (Arena + Leaderboard)
│   ├── grouper.py      # Model family grouping and dedup
│   └── cache.py        # JSON cache
├── engine/
│   ├── vram.py         # VRAM requirement estimation
│   ├── compatibility.py # Hardware compatibility check
│   ├── performance.py  # Inference speed estimation
│   └── ranker.py       # Scoring and ranking
└── output/
    └── display.py      # Rich table output

Contributing

Contributions are welcome! See CONTRIBUTING.md for guidelines.

Requirements

Python 3.11+
NVIDIA GPU detection via nvidia-ml-py (included by default)
AMD / Apple Silicon detected automatically

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

andyyyy64

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.5.1

May 13, 2026

This version

0.5.0

Mar 9, 2026

0.4.0

Mar 9, 2026

0.3.0

Mar 5, 2026

0.2.2

Mar 5, 2026

0.2.1

Mar 5, 2026

0.2.0

Mar 5, 2026

0.1.2

Mar 4, 2026

0.1.1

Mar 4, 2026

0.1.0

Mar 4, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

whichllm-0.5.0.tar.gz (1.0 MB view details)

Uploaded Mar 9, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

whichllm-0.5.0-py3-none-any.whl (54.3 kB view details)

Uploaded Mar 9, 2026 Python 3

File details

Details for the file whichllm-0.5.0.tar.gz.

File metadata

Download URL: whichllm-0.5.0.tar.gz
Upload date: Mar 9, 2026
Size: 1.0 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for whichllm-0.5.0.tar.gz
Algorithm	Hash digest
SHA256	`b4530b0b1ecd3992a88cd2927cf55a705a69f2afa749f58180a2af711f8ee0d5`
MD5	`4bb043470ed4c7ed749f1db7fa8e3fcd`
BLAKE2b-256	`8d03d3aac668256baa32e678184d0157469c1e2ff316e3acb95ebed20be1907f`

See more details on using hashes here.

Provenance

The following attestation bundles were made for whichllm-0.5.0.tar.gz:

Publisher: publish.yml on Andyyyy64/whichllm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: whichllm-0.5.0.tar.gz
- Subject digest: b4530b0b1ecd3992a88cd2927cf55a705a69f2afa749f58180a2af711f8ee0d5
- Sigstore transparency entry: 1066479978
- Sigstore integration time: Mar 9, 2026
Source repository:
- Permalink: Andyyyy64/whichllm@9ff751956724fb5e9f612f0821a4ec7abdc63f4a
- Branch / Tag: refs/tags/v0.5.0
- Owner: https://github.com/Andyyyy64
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@9ff751956724fb5e9f612f0821a4ec7abdc63f4a
- Trigger Event: release

File details

Details for the file whichllm-0.5.0-py3-none-any.whl.

File metadata

Download URL: whichllm-0.5.0-py3-none-any.whl
Upload date: Mar 9, 2026
Size: 54.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for whichllm-0.5.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`98701db8f06e3dedee82f76566db0b1e39c277eef181638eaa9b60ac9fa2918f`
MD5	`041824851be738111724025429a60971`
BLAKE2b-256	`448cf08be2a03664c8a3307cad5a33f522ef797d2023427494760b4e3fb294e2`

See more details on using hashes here.

Provenance

The following attestation bundles were made for whichllm-0.5.0-py3-none-any.whl:

Publisher: publish.yml on Andyyyy64/whichllm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: whichllm-0.5.0-py3-none-any.whl
- Subject digest: 98701db8f06e3dedee82f76566db0b1e39c277eef181638eaa9b60ac9fa2918f
- Sigstore transparency entry: 1066479980
- Sigstore integration time: Mar 9, 2026
Source repository:
- Permalink: Andyyyy64/whichllm@9ff751956724fb5e9f612f0821a4ec7abdc63f4a
- Branch / Tag: refs/tags/v0.5.0
- Owner: https://github.com/Andyyyy64
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@9ff751956724fb5e9f612f0821a4ec7abdc63f4a
- Trigger Event: release

whichllm 0.5.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

whichllm

Why whichllm?

Features

Run & Snippet

Install

pipx (recommended)

Homebrew

pip

Development

Usage

Integrations

Ollama

Shell alias

Scoring

How it works

Data pipeline

Ranking engine

Project structure

Contributing

Requirements

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance