Find the best local AI model for your GPU — terminal UI
Project description
fitmyllm
Run the right LLM locally. Automatically.
Install
pip install fitmyllm
Or run without installing:
pipx run fitmyllm
Setup
Get your free API key at fitmyllm.com/?tab=mcp, then:
fitmyllm setup
# Paste your API key (starts with fml_)
Or set it as an environment variable:
export FITMYLLM_API_KEY=fml_your_key_here
Run
fitmyllm
Features
| Screen | Description |
|---|---|
| Quick Run | Zero-config: detect GPU → recommend best model → download GGUF → start server → chat. No decisions needed |
| Find Models | Auto-detect GPU, 11 filters (use case, context, size, family, quant, speed...), 30+ models ranked by score |
| Find GPU | GPU recommendations for any model with budget, speed, vendor, and quant filters |
| Enterprise | 10-tab deployment analysis: overview, risk, checklist, TCO, scaling, SLA, GPU matrix, performance, fine-tuning, architecture |
| Model Library | Browse all installed models from every backend (Ollama, llama-server, local GGUF). Chat, delete, disk usage |
| Compare | Side-by-side comparison of up to 4 models with all metrics |
| Install | Choose quantization, pick engine (8 supported), or download GGUF directly from HuggingFace with progress bar |
| Chat | Talk to models via any backend with real-time streaming and collapsible thinking blocks |
| Run Benchmark | Select from installed/recommended models, backend-agnostic speed test with delta vs predicted speed |
| Tier List | Models and GPUs ranked S-F with cloud GPU alternatives |
| Benchmarks | Leaderboard sortable by 8 benchmark metrics |
| GPU Prices | Search and compare GPU pricing with vendor filter |
| Command Simulator | Interactive parameter tuning for 8 engines |
| Charts | ASCII score/speed/VRAM bars and quality-vs-speed scatter plot |
Multi-Backend Support
The CLI auto-detects running inference backends and works with any of them:
| Backend | Port | Notes |
|---|---|---|
| Ollama | 11434 | Full support: pull, run, chat, model listing |
| llama-server | 8080 | llama.cpp HTTP server — auto-started or manual |
| OpenAI-compatible | 8080 | vLLM, LM Studio, or any /v1/chat/completions server |
Quick Run can auto-start llama-server with optimal parameters (GPU layers, context length, batch size) calculated from your hardware.
GGUF Model Management
Download and manage GGUF models without Ollama:
- Download from any HuggingFace repo by quantization level
- Inventory tracked in
~/.fitmyllm/models/inventory.json - Storage in
~/.fitmyllm/models/(configurable) - No extra dependencies — uses httpx for downloads
Keyboard Shortcuts
| Key | Action |
|---|---|
f |
Toggle filter panel |
g |
Search/change GPU |
Space |
Mark model for comparison |
c |
Compare marked models / Chat from library |
d |
Delete model (in Model Library) |
i |
Install model |
m |
Manual input (in Run Benchmark) |
t |
Command simulator / Toggle thinking |
s |
Save/unsave model |
r |
Refresh / Show HuggingFace README |
e |
Export results as Markdown |
v |
Show ASCII charts |
Ctrl+S |
Save current filters as defaults |
Ctrl+T |
Toggle thinking blocks in chat |
Esc |
Go back |
q |
Quit |
Supported Engines
Ollama, llama-server, vLLM, LM Studio, llama.cpp, KoboldCpp, Jan, Docker Model Runner
Data Storage
~/.fitmyllm/
config.json Preferences, API key, saved models, backend preference
cache/ API response cache (24h TTL, offline fallback)
models/ Downloaded GGUF files + inventory.json
Requirements
- Python 3.10+
- API key from fitmyllm.com
- Ollama or llama-server (optional — for chat/benchmark features)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fitmyllm-0.3.20.tar.gz.
File metadata
- Download URL: fitmyllm-0.3.20.tar.gz
- Upload date:
- Size: 63.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
82a1cc4019aa2d6b4d4acc4342094cc552d33ee0a8b71ed300938f9291eb915d
|
|
| MD5 |
b49c14a0e26a1e5aa9c39946d75dd70b
|
|
| BLAKE2b-256 |
37aede1d781224b4b36cf391f0a06506f68c2168ac9a228c1492b28af9c427ee
|
File details
Details for the file fitmyllm-0.3.20-py3-none-any.whl.
File metadata
- Download URL: fitmyllm-0.3.20-py3-none-any.whl
- Upload date:
- Size: 88.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2f0a058200b25ccdabf71154aca430518323364eef4f8a083da7edaed72176a5
|
|
| MD5 |
4394e70d6fe4c6f14572ef43f0574d09
|
|
| BLAKE2b-256 |
ca5b28fe29c71c3b6e3a7550ae5339971c69e93e555e50da9253eff9f1b3fd62
|