llmscan

A fast, modern CLI that shows which local LLMs fit on your machine

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

 █   █   █▄ ▄█ ▄▀▀ ▄▀▀ ▄▀▄ █▄ █
 █▄▄ █▄▄ █ ▀ █ ▄██ ▀▄▄ █▀█ █ ▀█

A fast CLI that scans your hardware and tells you which local LLMs will actually run on your machine.

Features

Auto-detects your hardware — OS, CPU, RAM, and GPU memory in seconds
Multi-vendor GPU support — NVIDIA (nvidia-smi), AMD ROCm (rocm-smi), Intel Arc (xpu-smi/clinfo), Apple Silicon (unified memory), Windows (Get-CimInstance/wmic)
Smart fitness scoring — Rates every model as great, ok, tight, or no with a short reason code (e.g. ok (cpu-only), ok (vram -25%), tight (multi-gpu))
Backend-aware scoring — --backend ollama|llama-cpp|mlx adjusts memory overhead estimates for the actual inference backend you're using
45+ bundled models — Llama, Qwen, Mistral, Gemma, Phi, DeepSeek, CodeLlama, StarCoder, and more
Hugging Face search — Find GGUF models on Hugging Face and add them to your local catalog
Auto-computed VRAM — Adds models with formula-derived memory requirements from parameter count and quantization
IQ quant support — Full iMatrix quantization family (IQ1_S through IQ4_NL) with calibrated VRAM estimates
Multi-GPU aware — Accounts for tensor parallelism across multiple GPUs with per-card breakdown
CPU-only inference — Recognizes when you have enough RAM to run models without a GPU
Ollama integration — --running cross-references your catalog against what Ollama has loaded
Catalog updates — Fetch the latest model list from a remote URL without reinstalling; shows a diff of new/updated/removed models
Diagnostics — llmscan doctor checks which detection tools are available and flags anomalies
Filter and sort — --family Llama, --sort vram, --min-rating ok for focused output
Beautiful terminal UI — Rich tables, color-coded ratings, ASCII banner
CI-friendly — --no-color / --plain strips all ANSI formatting for log capture and pipelines
Custom catalogs — Bring your own model list as a JSON file
JSON and CSV output — Pipe results into scripts, dashboards, or spreadsheets with --json or --csv
Shell completions — --install-completion for bash, zsh, and fish
Version update check — llmscan version --check pings PyPI to notify you of new releases

Install

# With pip
pip install llmscan

# With pipx (isolated install)
pipx install llmscan

# With uv
uv pip install llmscan

# From source
git clone https://github.com/adityaarakeri/llmscan.git
cd llmscan
pip install -e .

The installed command remains llmscan.

Demo

llmscan demo

Quick Start

# Show banner + hardware summary + compatible models
llmscan

# Check version
llmscan --version

# Strip color for CI / log capture (applies to all subcommands)
llmscan --no-color list
llmscan --plain scan --json

Usage

`scan` — Inspect your hardware

llmscan scan              # Rich formatted hardware details
llmscan scan --json       # Machine-readable JSON output

`list` — List compatible models

llmscan list                          # Show models rated "tight" and above
llmscan list --min-rating great       # Only show "great" fits
llmscan list --min-rating no          # Show all models including non-fits
llmscan list --json                   # JSON output with machine profile + models
llmscan list --catalog my_models.json # Use a custom catalog file
llmscan list --family Llama           # Filter by model family (case-insensitive)
llmscan list --sort vram              # Sort by: rating (default), params, vram, name
llmscan list --running                # Mark which models Ollama currently has loaded
llmscan list --backend ollama         # Adjust scoring for Ollama's memory overhead
llmscan list --backend mlx            # Adjust scoring for MLX framework overhead
llmscan list --csv                    # Output as CSV (pipe to a spreadsheet or script)

Each row in the table shows a short reason code next to the rating explaining why a model received that score — e.g. ok (cpu-only), ok (vram -25%), tight (multi-gpu), tight (partial offload). When models are hidden by the active filter, a summary line at the bottom tells you how many are hidden and how to reveal them.

`explain` — Deep-dive on a specific model

llmscan explain llama-3.1-8b-instruct              # Why does this model fit?
llmscan explain qwen2.5-72b-instruct               # Why doesn't it?
llmscan explain my-model --catalog my_models.json   # Explain from a custom catalog

`search` — Find GGUF models on Hugging Face

llmscan search llama                          # Search for Llama GGUF models
llmscan search "codellama 13b"                # More specific search
llmscan search mistral --limit 5              # Limit results
llmscan search qwen --json                    # JSON output
llmscan search llama --min-params 7           # Only models with 7B+ parameters
llmscan search llama --max-params 13          # Only models up to 13B parameters
llmscan search llama --min-params 7 --max-params 70  # Parameter range filter

Models without an inferable parameter count in their name are excluded when either --min-params or --max-params is active.

`add` — Add a model to your local catalog

VRAM/RAM requirements are auto-computed from the parameter count and quantization type.

# Add by specifying params and quant manually
llmscan add my-model --params-b 7 --quant Q4_K_M --family Llama
llmscan add my-model --params-b 7 --quant Q4_K_M --family Llama --notes "My fine-tune"

# Add from a Hugging Face repo (auto-detects params and quant from repo name / filenames)
# llmscan will show what it detected and prompt you to override with --params-b / --quant if wrong
llmscan add TheBloke/Llama-2-7B-GGUF

# Preview computed VRAM/RAM values without writing to catalog
llmscan add my-model --params-b 7 --quant Q4_K_M --dry-run
llmscan add my-model --params-b 7 --quant Q4_K_M --dry-run --json

# Overwrite an existing entry
llmscan add my-model --params-b 7 --quant Q8_0 --family Llama --force

# JSON output
llmscan add my-model --params-b 7 --quant Q4_K_M --json

Supported quantization types: Q2_K, Q3_K_S, Q3_K_M, Q3_K_L, Q4_0, Q4_K_S, Q4_K_M, Q5_0, Q5_K_S, Q5_K_M, Q6_K, Q8_0, F16, IQ2_XS, IQ3_XS

Models are saved to ~/.llmscan/catalog.json and automatically merged with the bundled catalog. If that file becomes malformed, llmscan will print a warning and fall back to the bundled catalog rather than crashing.

`remove` — Remove a model from your local catalog

llmscan remove my-model   # Only removes user-added models, not bundled ones

`doctor` — Diagnose hardware detection

Checks which detection tools are present on your system (nvidia-smi, rocm-smi, xpu-smi, sysctl, wmic, clinfo), reports their paths, and flags anomalies such as a GPU being detected with 0 VRAM reported.

llmscan doctor          # Rich table showing tool availability + any anomalies
llmscan doctor --json   # Machine-readable JSON

`catalog update` — Refresh the model list

Fetches the latest models.json from a remote URL and merges new or updated entries into your local catalog. Your own custom-added models are always preserved.

llmscan catalog update                          # Fetch and apply updates
llmscan catalog update --dry-run                # Preview what would change, don't save
llmscan catalog update --json                   # JSON diff output
llmscan catalog update --url https://example.com/catalog.json  # Use a custom URL

`version` — Show version and check for updates

llmscan version           # Print installed version
llmscan version --check   # Also check PyPI for a newer release (2s timeout)

Global flags

These flags apply to every subcommand and must come before the subcommand name:

Flag	Description
`--no-color`	Strip all ANSI color and Rich formatting
`--plain`	Alias for `--no-color`
`--version`, `-V`	Print version and exit
`--install-completion`	Install shell completion for bash/zsh/fish
`--show-completion`	Print the completion script for the current shell

llmscan --no-color list
llmscan --plain scan --json
llmscan --install-completion zsh
llmscan --show-completion bash

Example Output

llmscan list --min-rating ok

┃ Model                        ┃ Family  ┃ Params ┃ Fit              ┃ Min VRAM ┃ Rec VRAM ┃ Rec RAM ┃
┃ llama-3.1-8b-instruct        ┃ Llama   ┃ 8B     ┃ great            ┃ 5.0 GB   ┃ 6.2 GB   ┃ 10 GB   ┃
┃ mistral-7b-instruct-v0.3     ┃ Mistral ┃ 7B     ┃ great            ┃ 4.4 GB   ┃ 5.5 GB   ┃ 8 GB    ┃
┃ phi-4-mini                   ┃ Phi     ┃ 4B     ┃ ok (vram -22%)   ┃ 2.5 GB   ┃ 3.1 GB   ┃ 6 GB    ┃
┃ qwen2.5-7b-instruct          ┃ Qwen    ┃ 7B     ┃ ok (cpu-only)    ┃ 4.4 GB   ┃ 5.5 GB   ┃ 8 GB    ┃
...
+8 models hidden (rated "no" — insufficient hardware). Run with --min-rating no to show all.

Rating System

Rating	Meaning
`great`	GPU VRAM and RAM both meet recommended targets — best experience
`ok`	Meets minimum requirements; may need moderate context limits or uses CPU-only inference
`tight`	Runs with CPU offload, reduced context, or multi-GPU splitting — expect slower performance
`no`	Hardware is below practical minimums

Each rating is annotated with a short reason code shown in parentheses:

Reason code	When it appears
(none)	`great` or `no` — the rating is self-explanatory
`vram -N%`	GPU clears the minimum but is N% below recommended VRAM
`ram low`	GPU meets recommended VRAM but system RAM is the bottleneck
`cpu-only`	No usable GPU; inference runs entirely in system RAM
`multi-gpu`	Score depends on combining VRAM across multiple GPUs
`partial offload`	GPU handles most layers; remainder offloaded to CPU
`cpu offload`	GPU too small to help much; CPU carries most of the load

Backend-Aware Scoring

Different inference backends have different memory behaviors. The --backend flag adjusts the effective VRAM and RAM thresholds used for scoring so that estimates are accurate for your actual runtime:

Backend	VRAM overhead	RAM overhead	When to use
`llama-cpp`	none (baseline)	none	Default; estimates calibrated for llama.cpp
`ollama`	+15%	+10%	Ollama's large default context window and serving overhead
`mlx`	+10%	none	MLX framework overhead on Apple Silicon

llmscan list --backend ollama   # Models scored as if running inside Ollama
llmscan list --backend mlx      # Models scored for Apple MLX runtime

The JSON output always includes a "backend" field so you know which overhead was applied.

Ollama Integration

If Ollama is running locally, --running cross-references your catalog against http://localhost:11434/api/tags and marks which models are currently loaded:

llmscan list --running           # Adds a "Running" column to the table
llmscan list --running --json    # JSON includes "running": true/false per model

If Ollama is not reachable, a warning is shown and all models are marked as not running.

Supported Hardware

Vendor	Detection Method	Notes
NVIDIA	`nvidia-smi`	Discrete GPUs with CUDA
AMD	`rocm-smi`	GPUs with ROCm drivers
Intel	`xpu-smi`, `clinfo`	Arc and Data Center GPUs
Apple	`sysctl`	Apple Silicon unified memory (65% GPU estimate)
Windows	`Get-CimInstance`, `wmic`	Fallback for any GPU on Windows

WSL2 note: When running inside WSL2, llmscan automatically detects the environment and prints a warning that nvidia-smi VRAM readings may be inaccurate. Verify GPU memory on the Windows host before trusting the results.

Custom Catalogs

Create a JSON file with your own model entries:

[
  {
    "id": "my-custom-model-7b",
    "family": "Custom",
    "params_b": 7,
    "quant": "Q4_K_M",
    "min_vram_gb": 5,
    "recommended_vram_gb": 8,
    "recommended_ram_gb": 16,
    "notes": "My fine-tuned model"
  }
]

Then pass it with --catalog:

llmscan list --catalog my_models.json
llmscan explain my-custom-model-7b --catalog my_models.json

Required Fields

Field	Type	Description
`id`	string	Unique model identifier
`family`	string	Model family name (e.g., "LLaMA", "Mistral")
`params_b`	number	Parameter count in billions
`quant`	string	Quantization format (e.g., "Q4_K_M", "Q5_K_M")
`min_vram_gb`	number	Minimum GPU VRAM in GB
`recommended_vram_gb`	number	Recommended GPU VRAM in GB
`recommended_ram_gb`	number	Recommended system RAM in GB
`notes`	string	Additional notes about the model

Development

git clone https://github.com/adityaarakeri/llmscan.git
cd llmscan

# Install with dev dependencies
uv pip install -e ".[test,dev]"

Run all CI checks locally with make:

make          # lint + typecheck + test (same as make ci)
make lint     # ruff check + format check
make format   # auto-format with ruff
make typecheck  # mypy
make test     # pytest

Override the Python version with make PYTHON=3.10 ci.

Or run the individual commands directly:

# Run tests
uv run pytest

# Lint and format
uv run ruff check .
uv run ruff format .

# Type check
uv run mypy llmscan/

Contributing

Fork the repository
Create a feature branch (git checkout -b feature/my-feature)
Make your changes and add tests
Ensure all checks pass: uv run pytest && uv run ruff check . && uv run mypy llmscan/
Open a pull request

License

MIT — see LICENSE for details.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

aditya005

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.2.0

Apr 18, 2026

0.1.0

Mar 28, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmscan-0.2.0.tar.gz (68.3 kB view details)

Uploaded Apr 18, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llmscan-0.2.0-py3-none-any.whl (31.3 kB view details)

Uploaded Apr 18, 2026 Python 3

File details

Details for the file llmscan-0.2.0.tar.gz.

File metadata

Download URL: llmscan-0.2.0.tar.gz
Upload date: Apr 18, 2026
Size: 68.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for llmscan-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`085780c25a0ff06de19a62adebdd005cae65e958ceb707606c97d676634de676`
MD5	`54f75884bc6493947bbfcd73ba8c3e00`
BLAKE2b-256	`1cb2bf086b597ecbb438ce2d30f3c66c685818cf77697deb4bf2884e17f62ecf`

See more details on using hashes here.

Provenance

The following attestation bundles were made for llmscan-0.2.0.tar.gz:

Publisher: ci.yml on adityaarakeri/llmscan

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: llmscan-0.2.0.tar.gz
- Subject digest: 085780c25a0ff06de19a62adebdd005cae65e958ceb707606c97d676634de676
- Sigstore transparency entry: 1339570274
- Sigstore integration time: Apr 18, 2026
Source repository:
- Permalink: adityaarakeri/llmscan@fd3400aaf7c6a041b4b286795492f84739bb69fc
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/adityaarakeri
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: ci.yml@fd3400aaf7c6a041b4b286795492f84739bb69fc
- Trigger Event: push

File details

Details for the file llmscan-0.2.0-py3-none-any.whl.

File metadata

Download URL: llmscan-0.2.0-py3-none-any.whl
Upload date: Apr 18, 2026
Size: 31.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for llmscan-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`faf39b341d3ab42083cb7e2666f1c77184ce3d47003a392463ec9f0c8babbb69`
MD5	`e52d727adfa13a79f698bd48c8c8096a`
BLAKE2b-256	`e8eeaadb174b216d5e95a1dfeeeba06f7a1278aac25b8d0336958aecacf66074`

See more details on using hashes here.

Provenance

The following attestation bundles were made for llmscan-0.2.0-py3-none-any.whl:

Publisher: ci.yml on adityaarakeri/llmscan

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: llmscan-0.2.0-py3-none-any.whl
- Subject digest: faf39b341d3ab42083cb7e2666f1c77184ce3d47003a392463ec9f0c8babbb69
- Sigstore transparency entry: 1339570275
- Sigstore integration time: Apr 18, 2026
Source repository:
- Permalink: adityaarakeri/llmscan@fd3400aaf7c6a041b4b286795492f84739bb69fc
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/adityaarakeri
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: ci.yml@fd3400aaf7c6a041b4b286795492f84739bb69fc
- Trigger Event: push

llmscan 0.2.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Features

Install

Demo

Quick Start

Usage

scan — Inspect your hardware

list — List compatible models

explain — Deep-dive on a specific model

search — Find GGUF models on Hugging Face

add — Add a model to your local catalog

remove — Remove a model from your local catalog

doctor — Diagnose hardware detection

catalog update — Refresh the model list

version — Show version and check for updates

Global flags

Example Output

Rating System

Backend-Aware Scoring

Ollama Integration

Supported Hardware

Custom Catalogs

Required Fields

Development

Contributing

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`scan` — Inspect your hardware

`list` — List compatible models

`explain` — Deep-dive on a specific model

`search` — Find GGUF models on Hugging Face

`add` — Add a model to your local catalog

`remove` — Remove a model from your local catalog

`doctor` — Diagnose hardware detection

`catalog update` — Refresh the model list

`version` — Show version and check for updates