A fast, modern CLI that shows which local LLMs fit on your machine
Project description
█ █ █▄ ▄█ ▄▀▀ ▄▀▀ ▄▀▄ █▄ █
█▄▄ █▄▄ █ ▀ █ ▄██ ▀▄▄ █▀█ █ ▀█
A fast CLI that scans your hardware and tells you which local LLMs will actually run on your machine.
Features
- Auto-detects your hardware — OS, CPU, RAM, and GPU memory in seconds
- Multi-vendor GPU support — NVIDIA (
nvidia-smi), AMD ROCm (rocm-smi), Intel Arc (xpu-smi/clinfo), Apple Silicon (unified memory), Windows (Get-CimInstance/wmic) - Smart fitness scoring — Rates every model as
great,ok,tight, ornowith a short reason code (e.g.ok (cpu-only),ok (vram -25%),tight (multi-gpu)) - Backend-aware scoring —
--backend ollama|llama-cpp|mlxadjusts memory overhead estimates for the actual inference backend you're using - 45+ bundled models — Llama, Qwen, Mistral, Gemma, Phi, DeepSeek, CodeLlama, StarCoder, and more
- Hugging Face search — Find GGUF models on Hugging Face and add them to your local catalog
- Auto-computed VRAM — Adds models with formula-derived memory requirements from parameter count and quantization
- IQ quant support — Full iMatrix quantization family (
IQ1_SthroughIQ4_NL) with calibrated VRAM estimates - Multi-GPU aware — Accounts for tensor parallelism across multiple GPUs with per-card breakdown
- CPU-only inference — Recognizes when you have enough RAM to run models without a GPU
- Ollama integration —
--runningcross-references your catalog against what Ollama has loaded - Catalog updates — Fetch the latest model list from a remote URL without reinstalling; shows a diff of new/updated/removed models
- Diagnostics —
llmscan doctorchecks which detection tools are available and flags anomalies - Filter and sort —
--family Llama,--sort vram,--min-rating okfor focused output - Beautiful terminal UI — Rich tables, color-coded ratings, ASCII banner
- CI-friendly —
--no-color/--plainstrips all ANSI formatting for log capture and pipelines - Custom catalogs — Bring your own model list as a JSON file
- JSON and CSV output — Pipe results into scripts, dashboards, or spreadsheets with
--jsonor--csv - Shell completions —
--install-completionfor bash, zsh, and fish - Version update check —
llmscan version --checkpings PyPI to notify you of new releases
Install
# With pip
pip install llmscan
# With pipx (isolated install)
pipx install llmscan
# With uv
uv pip install llmscan
# From source
git clone https://github.com/adityaarakeri/llmscan.git
cd llmscan
pip install -e .
The installed command remains llmscan.
Demo
Quick Start
# Show banner + hardware summary + compatible models
llmscan
# Check version
llmscan --version
# Strip color for CI / log capture (applies to all subcommands)
llmscan --no-color list
llmscan --plain scan --json
Usage
scan — Inspect your hardware
llmscan scan # Rich formatted hardware details
llmscan scan --json # Machine-readable JSON output
list — List compatible models
llmscan list # Show models rated "tight" and above
llmscan list --min-rating great # Only show "great" fits
llmscan list --min-rating no # Show all models including non-fits
llmscan list --json # JSON output with machine profile + models
llmscan list --catalog my_models.json # Use a custom catalog file
llmscan list --family Llama # Filter by model family (case-insensitive)
llmscan list --sort vram # Sort by: rating (default), params, vram, name
llmscan list --running # Mark which models Ollama currently has loaded
llmscan list --backend ollama # Adjust scoring for Ollama's memory overhead
llmscan list --backend mlx # Adjust scoring for MLX framework overhead
llmscan list --csv # Output as CSV (pipe to a spreadsheet or script)
Each row in the table shows a short reason code next to the rating explaining why a model received that score — e.g. ok (cpu-only), ok (vram -25%), tight (multi-gpu), tight (partial offload). When models are hidden by the active filter, a summary line at the bottom tells you how many are hidden and how to reveal them.
explain — Deep-dive on a specific model
llmscan explain llama-3.1-8b-instruct # Why does this model fit?
llmscan explain qwen2.5-72b-instruct # Why doesn't it?
llmscan explain my-model --catalog my_models.json # Explain from a custom catalog
search — Find GGUF models on Hugging Face
llmscan search llama # Search for Llama GGUF models
llmscan search "codellama 13b" # More specific search
llmscan search mistral --limit 5 # Limit results
llmscan search qwen --json # JSON output
llmscan search llama --min-params 7 # Only models with 7B+ parameters
llmscan search llama --max-params 13 # Only models up to 13B parameters
llmscan search llama --min-params 7 --max-params 70 # Parameter range filter
Models without an inferable parameter count in their name are excluded when either --min-params or --max-params is active.
add — Add a model to your local catalog
VRAM/RAM requirements are auto-computed from the parameter count and quantization type.
# Add by specifying params and quant manually
llmscan add my-model --params-b 7 --quant Q4_K_M --family Llama
llmscan add my-model --params-b 7 --quant Q4_K_M --family Llama --notes "My fine-tune"
# Add from a Hugging Face repo (auto-detects params and quant from repo name / filenames)
# llmscan will show what it detected and prompt you to override with --params-b / --quant if wrong
llmscan add TheBloke/Llama-2-7B-GGUF
# Preview computed VRAM/RAM values without writing to catalog
llmscan add my-model --params-b 7 --quant Q4_K_M --dry-run
llmscan add my-model --params-b 7 --quant Q4_K_M --dry-run --json
# Overwrite an existing entry
llmscan add my-model --params-b 7 --quant Q8_0 --family Llama --force
# JSON output
llmscan add my-model --params-b 7 --quant Q4_K_M --json
Supported quantization types: Q2_K, Q3_K_S, Q3_K_M, Q3_K_L, Q4_0, Q4_K_S, Q4_K_M, Q5_0, Q5_K_S, Q5_K_M, Q6_K, Q8_0, F16, IQ2_XS, IQ3_XS
Models are saved to ~/.llmscan/catalog.json and automatically merged with the bundled catalog. If that file becomes malformed, llmscan will print a warning and fall back to the bundled catalog rather than crashing.
remove — Remove a model from your local catalog
llmscan remove my-model # Only removes user-added models, not bundled ones
doctor — Diagnose hardware detection
Checks which detection tools are present on your system (nvidia-smi, rocm-smi, xpu-smi, sysctl, wmic, clinfo), reports their paths, and flags anomalies such as a GPU being detected with 0 VRAM reported.
llmscan doctor # Rich table showing tool availability + any anomalies
llmscan doctor --json # Machine-readable JSON
catalog update — Refresh the model list
Fetches the latest models.json from a remote URL and merges new or updated entries into your local catalog. Your own custom-added models are always preserved.
llmscan catalog update # Fetch and apply updates
llmscan catalog update --dry-run # Preview what would change, don't save
llmscan catalog update --json # JSON diff output
llmscan catalog update --url https://example.com/catalog.json # Use a custom URL
version — Show version and check for updates
llmscan version # Print installed version
llmscan version --check # Also check PyPI for a newer release (2s timeout)
Global flags
These flags apply to every subcommand and must come before the subcommand name:
| Flag | Description |
|---|---|
--no-color |
Strip all ANSI color and Rich formatting |
--plain |
Alias for --no-color |
--version, -V |
Print version and exit |
--install-completion |
Install shell completion for bash/zsh/fish |
--show-completion |
Print the completion script for the current shell |
llmscan --no-color list
llmscan --plain scan --json
llmscan --install-completion zsh
llmscan --show-completion bash
Example Output
llmscan list --min-rating ok
┃ Model ┃ Family ┃ Params ┃ Fit ┃ Min VRAM ┃ Rec VRAM ┃ Rec RAM ┃
┃ llama-3.1-8b-instruct ┃ Llama ┃ 8B ┃ great ┃ 5.0 GB ┃ 6.2 GB ┃ 10 GB ┃
┃ mistral-7b-instruct-v0.3 ┃ Mistral ┃ 7B ┃ great ┃ 4.4 GB ┃ 5.5 GB ┃ 8 GB ┃
┃ phi-4-mini ┃ Phi ┃ 4B ┃ ok (vram -22%) ┃ 2.5 GB ┃ 3.1 GB ┃ 6 GB ┃
┃ qwen2.5-7b-instruct ┃ Qwen ┃ 7B ┃ ok (cpu-only) ┃ 4.4 GB ┃ 5.5 GB ┃ 8 GB ┃
...
+8 models hidden (rated "no" — insufficient hardware). Run with --min-rating no to show all.
Rating System
| Rating | Meaning |
|---|---|
great |
GPU VRAM and RAM both meet recommended targets — best experience |
ok |
Meets minimum requirements; may need moderate context limits or uses CPU-only inference |
tight |
Runs with CPU offload, reduced context, or multi-GPU splitting — expect slower performance |
no |
Hardware is below practical minimums |
Each rating is annotated with a short reason code shown in parentheses:
| Reason code | When it appears |
|---|---|
| (none) | great or no — the rating is self-explanatory |
vram -N% |
GPU clears the minimum but is N% below recommended VRAM |
ram low |
GPU meets recommended VRAM but system RAM is the bottleneck |
cpu-only |
No usable GPU; inference runs entirely in system RAM |
multi-gpu |
Score depends on combining VRAM across multiple GPUs |
partial offload |
GPU handles most layers; remainder offloaded to CPU |
cpu offload |
GPU too small to help much; CPU carries most of the load |
Backend-Aware Scoring
Different inference backends have different memory behaviors. The --backend flag adjusts the effective VRAM and RAM thresholds used for scoring so that estimates are accurate for your actual runtime:
| Backend | VRAM overhead | RAM overhead | When to use |
|---|---|---|---|
llama-cpp |
none (baseline) | none | Default; estimates calibrated for llama.cpp |
ollama |
+15% | +10% | Ollama's large default context window and serving overhead |
mlx |
+10% | none | MLX framework overhead on Apple Silicon |
llmscan list --backend ollama # Models scored as if running inside Ollama
llmscan list --backend mlx # Models scored for Apple MLX runtime
The JSON output always includes a "backend" field so you know which overhead was applied.
Ollama Integration
If Ollama is running locally, --running cross-references your catalog against http://localhost:11434/api/tags and marks which models are currently loaded:
llmscan list --running # Adds a "Running" column to the table
llmscan list --running --json # JSON includes "running": true/false per model
If Ollama is not reachable, a warning is shown and all models are marked as not running.
Supported Hardware
| Vendor | Detection Method | Notes |
|---|---|---|
| NVIDIA | nvidia-smi |
Discrete GPUs with CUDA |
| AMD | rocm-smi |
GPUs with ROCm drivers |
| Intel | xpu-smi, clinfo |
Arc and Data Center GPUs |
| Apple | sysctl |
Apple Silicon unified memory (65% GPU estimate) |
| Windows | Get-CimInstance, wmic |
Fallback for any GPU on Windows |
WSL2 note: When running inside WSL2,
llmscanautomatically detects the environment and prints a warning thatnvidia-smiVRAM readings may be inaccurate. Verify GPU memory on the Windows host before trusting the results.
Custom Catalogs
Create a JSON file with your own model entries:
[
{
"id": "my-custom-model-7b",
"family": "Custom",
"params_b": 7,
"quant": "Q4_K_M",
"min_vram_gb": 5,
"recommended_vram_gb": 8,
"recommended_ram_gb": 16,
"notes": "My fine-tuned model"
}
]
Then pass it with --catalog:
llmscan list --catalog my_models.json
llmscan explain my-custom-model-7b --catalog my_models.json
Required Fields
| Field | Type | Description |
|---|---|---|
id |
string | Unique model identifier |
family |
string | Model family name (e.g., "LLaMA", "Mistral") |
params_b |
number | Parameter count in billions |
quant |
string | Quantization format (e.g., "Q4_K_M", "Q5_K_M") |
min_vram_gb |
number | Minimum GPU VRAM in GB |
recommended_vram_gb |
number | Recommended GPU VRAM in GB |
recommended_ram_gb |
number | Recommended system RAM in GB |
notes |
string | Additional notes about the model |
Development
git clone https://github.com/adityaarakeri/llmscan.git
cd llmscan
# Install with dev dependencies
uv pip install -e ".[test,dev]"
Run all CI checks locally with make:
make # lint + typecheck + test (same as make ci)
make lint # ruff check + format check
make format # auto-format with ruff
make typecheck # mypy
make test # pytest
Override the Python version with make PYTHON=3.10 ci.
Or run the individual commands directly:
# Run tests
uv run pytest
# Lint and format
uv run ruff check .
uv run ruff format .
# Type check
uv run mypy llmscan/
Contributing
- Fork the repository
- Create a feature branch (
git checkout -b feature/my-feature) - Make your changes and add tests
- Ensure all checks pass:
uv run pytest && uv run ruff check . && uv run mypy llmscan/ - Open a pull request
License
MIT — see LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llmscan-0.2.0.tar.gz.
File metadata
- Download URL: llmscan-0.2.0.tar.gz
- Upload date:
- Size: 68.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
085780c25a0ff06de19a62adebdd005cae65e958ceb707606c97d676634de676
|
|
| MD5 |
54f75884bc6493947bbfcd73ba8c3e00
|
|
| BLAKE2b-256 |
1cb2bf086b597ecbb438ce2d30f3c66c685818cf77697deb4bf2884e17f62ecf
|
Provenance
The following attestation bundles were made for llmscan-0.2.0.tar.gz:
Publisher:
ci.yml on adityaarakeri/llmscan
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
llmscan-0.2.0.tar.gz -
Subject digest:
085780c25a0ff06de19a62adebdd005cae65e958ceb707606c97d676634de676 - Sigstore transparency entry: 1339570274
- Sigstore integration time:
-
Permalink:
adityaarakeri/llmscan@fd3400aaf7c6a041b4b286795492f84739bb69fc -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/adityaarakeri
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@fd3400aaf7c6a041b4b286795492f84739bb69fc -
Trigger Event:
push
-
Statement type:
File details
Details for the file llmscan-0.2.0-py3-none-any.whl.
File metadata
- Download URL: llmscan-0.2.0-py3-none-any.whl
- Upload date:
- Size: 31.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
faf39b341d3ab42083cb7e2666f1c77184ce3d47003a392463ec9f0c8babbb69
|
|
| MD5 |
e52d727adfa13a79f698bd48c8c8096a
|
|
| BLAKE2b-256 |
e8eeaadb174b216d5e95a1dfeeeba06f7a1278aac25b8d0336958aecacf66074
|
Provenance
The following attestation bundles were made for llmscan-0.2.0-py3-none-any.whl:
Publisher:
ci.yml on adityaarakeri/llmscan
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
llmscan-0.2.0-py3-none-any.whl -
Subject digest:
faf39b341d3ab42083cb7e2666f1c77184ce3d47003a392463ec9f0c8babbb69 - Sigstore transparency entry: 1339570275
- Sigstore integration time:
-
Permalink:
adityaarakeri/llmscan@fd3400aaf7c6a041b4b286795492f84739bb69fc -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/adityaarakeri
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@fd3400aaf7c6a041b4b286795492f84739bb69fc -
Trigger Event:
push
-
Statement type: