Find the best local AI model for your GPU — terminal UI
Project description
fitmyllm
Run the right LLM locally. Automatically.
Install
pip install fitmyllm
Or run without installing:
pipx run fitmyllm
Setup
Get your free API key at fitmyllm.com/?tab=mcp, then:
fitmyllm setup
# Paste your API key (starts with fml_)
Or set it as an environment variable:
export FITMYLLM_API_KEY=fml_your_key_here
Run
fitmyllm # Interactive TUI (9 modes)
fitmyllm chat <model> # Chat directly with a model
fitmyllm benchmark # Run a speed benchmark
fitmyllm my-benchmarks # View your submitted benchmarks
fitmyllm telemetry on|off # Toggle anonymous speed telemetry
Features
Main screens
| Screen | Description |
|---|---|
| Quick Run | Zero-config: detect GPU → recommend best model → download GGUF → start server → chat. No decisions needed |
| Find Models | Auto-detect GPU, 18+ filters (use case, context, size, family, quant, speed, KV cache, capabilities, 14 benchmark minimums, 19 sort options including per-benchmark ranking), multi-GPU support |
| Find GPU | GPU recommendations for any model with budget, speed, vendor, and quant filters |
| Enterprise | 10-tab deployment analysis: overview, risk, checklist, TCO, scaling, SLA, GPU matrix, performance, fine-tuning, architecture |
| Model Library | Browse all installed models from every backend (Ollama, llama-server, local GGUF). Chat, delete, disk usage |
| Tier List | Models and GPUs ranked S-F with cloud GPU alternatives |
| Benchmarks | Leaderboard sortable by 8 benchmark metrics |
| GPU Prices | Search and compare GPU pricing with vendor filter |
| Run Benchmark | Select from installed/recommended models, backend-agnostic speed test with community comparison |
Live Speed Metrics
Chat shows real-time tok/s during streaming and a summary after each response:
42.3 tok/s · 210ms TTFT · 156 tokens
Community Speed Telemetry
When opted in (fitmyllm telemetry on), the CLI silently collects anonymous speed metrics (tok/s, TTFT) during chat sessions and uploads them to improve predictions. No message content is ever sent.
Community speed data feeds back into the CLI and the web UI:
- Find Models detail panel:
Community 42 tok/s (12 reports)alongside predicted speed - Model Detail: per-quant breakdown with median, range, and report count
- Benchmark results: your speed vs community median comparison
- Web model pages: community speed section on fitmyllm.com model detail pages
Available from within screens
| Feature | Access | Description |
|---|---|---|
| Compare | Space to mark, c to compare |
Side-by-side comparison of up to 4 models with all metrics |
| Install | i on any model |
Choose quantization, pick engine (8 supported), or download GGUF from HuggingFace with progress bar |
| Chat | c from Model Library |
Talk to models via any backend with real-time streaming and collapsible thinking blocks |
| Charts | v from Find Models |
ASCII score/speed/VRAM bars and quality-vs-speed scatter plot |
| Command Simulator | t from model detail |
Interactive parameter tuning for 8 engines (context, batch size, KV quant, GPU layers) |
| Export | e from Find Models |
Export results as Markdown |
Multi-Backend Support
The CLI auto-detects running inference backends and works with any of them:
| Backend | Port | Notes |
|---|---|---|
| Ollama | 11434 | Full support: pull, run, chat, model listing |
| llama-server | 8080 | llama.cpp HTTP server — auto-started or manual |
| OpenAI-compatible | 8080 | vLLM, LM Studio, or any /v1/chat/completions server |
Quick Run can auto-start llama-server with optimal parameters (GPU layers, context length, batch size) calculated from your hardware.
GGUF Model Management
Download and manage GGUF models without Ollama:
- Download from any HuggingFace repo by quantization level
- Inventory tracked in
~/.fitmyllm/models/inventory.json - Storage in
~/.fitmyllm/models/(configurable) - No extra dependencies — uses httpx for downloads
Keyboard Shortcuts
| Key | Action |
|---|---|
f |
Toggle filter panel |
g |
Search/change GPU |
Space |
Mark model for comparison |
c |
Compare marked models / Chat from library |
d |
Delete model (in Model Library) |
i |
Install model |
m |
Manual input (in Run Benchmark) |
t |
Command simulator / Toggle thinking |
s |
Save/unsave model |
r |
Refresh / Show HuggingFace README |
e |
Export results as Markdown |
v |
Show ASCII charts |
Ctrl+S |
Save current filters as defaults |
Ctrl+T |
Toggle thinking blocks in chat |
Esc |
Go back |
q |
Quit |
Supported Engines
Ollama, llama-server, vLLM, LM Studio, llama.cpp, KoboldCpp, Jan, Docker Model Runner
Data Storage
~/.fitmyllm/
config.json Preferences, API key, saved models, backend preference, telemetry opt-in
cache/ API response cache (24h TTL, offline fallback)
models/ Downloaded GGUF files + inventory.json
Requirements
- Python 3.10+
- API key from fitmyllm.com
- Ollama or llama-server (optional — for chat/benchmark features)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fitmyllm-0.3.101.tar.gz.
File metadata
- Download URL: fitmyllm-0.3.101.tar.gz
- Upload date:
- Size: 74.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1aa4f9aad4c1c0da8b09bc25c870809dbb45af7b6f3015422f591e10fcab2d88
|
|
| MD5 |
fc3cfffe4f3508e8f02b7a432e8bc720
|
|
| BLAKE2b-256 |
b2895e8f27571be16e1c1174b9fe0e6e5eb3a2ef77f951740996fd964cb6583c
|
File details
Details for the file fitmyllm-0.3.101-py3-none-any.whl.
File metadata
- Download URL: fitmyllm-0.3.101-py3-none-any.whl
- Upload date:
- Size: 101.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d55102053012ae2f0a30f668fa06ba2b4e68ec32e6376ebf82e7da34de385e32
|
|
| MD5 |
c2bc7c728d1e2fee4e88bd776e8c3189
|
|
| BLAKE2b-256 |
8161c41f824c63abaad5443892d4dd609ebc2460e9c42221acdf06421e95b9bd
|