Will it fit? GPU toolkit for AI models — MLX + GGUF + Cloud in one menu.
Project description
localfit
██╗ ██████╗ ██████╗ █████╗ ██╗ ███████╗██╗████████╗
██║ ██╔═══██╗██╔════╝██╔══██╗██║ ██╔════╝██║╚══██╔══╝
██║ ██║ ██║██║ ███████║██║ █████╗ ██║ ██║
██║ ██║ ██║██║ ██╔══██║██║ ██╔══╝ ██║ ██║
███████╗╚██████╔╝╚██████╗██║ ██║███████╗██║ ██║ ██║
╚══════╝ ╚═════╝ ╚═════╝╚═╝ ╚═╝╚══════╝╚═╝ ╚═╝ ╚═╝
Will it fit? Say what model you want — localfit figures out the rest.
Fits locally? Run it via MLX or llama.cpp. Doesn't fit? Kaggle free GPU. Still too big? RunPod cloud. Need a custom quant? Quantize remotely and upload to HuggingFace. You never think about hardware.
pip install localfit
Quick Start
localfit # GPU dashboard + trending models
localfit run gemma4:e4b # interactive menu: pick MLX, GGUF, or Cloud
localfit run qwen3:14b # doesn't fit? menu shows Kaggle/RunPod options
localfit --launch claude # start model + launch Claude Code
localfit makeitfit llama-4-scout # quantize remotely → upload to your HuggingFace
The Run Menu
When you localfit run MODEL, you get an interactive menu with arrow key navigation — pick your backend before anything downloads:
╭──────────────────────── Qwen2.5-7B-Instruct ─────────────────────────╮
│ LOCAL │
│ › MLX Qwen2.5-7B-Instruct-4bit 3.5GB │
│ MLX Qwen2.5-7B-Instruct-8bit 7.0GB ⭐ │
│ GGUF Q4_K_M 4.4GB │
│ GGUF Q8_0 7.5GB │
│ │
│ REMOTE │
│ Kaggle T4 16GB free │
│ RunPod RTX A5000 24GB $0.16/hr │
╰──────────────────────────── Apple Silicon 16GB ──────────────────────╯
- MLX — Native Apple Silicon, fastest on Mac. Auto-discovers mlx-community models.
- GGUF — llama.cpp with Metal/CUDA. Works everywhere.
- Kaggle — Free 30h/week GPU. One click to deploy.
- RunPod — Paid cloud GPU. Auto-picks cheapest that fits.
Backends
MLX (Apple Silicon)
localfit auto-detects if you have mlx-lm installed and finds MLX models on HuggingFace:
pip install mlx-lm # one-time setup
localfit run gemma-3-4b-it # auto-picks mlx-community/gemma-3-4b-it-8bit
If no mlx-community model exists, localfit can convert any HuggingFace model to MLX locally:
localfit run bytedance-research/UI-TARS-7B-DPO
# → No mlx-community model found
# → "Convert locally? mlx_lm needs ~14GB RAM (you have 24GB)"
# → Converts to MLX 4-bit → serves immediately
GGUF (llama.cpp)
The default for all platforms. localfit downloads the best GGUF quant for your GPU and serves via llama-server:
localfit run gemma4:26b # MoE, 12GB, best quality on 24GB Mac
localfit show gemma4:26b # show all quants + fit analysis + cloud pricing
Remote Kaggle (Free)
30 hours/week of free T4 GPU. localfit auto-deploys via Cloudflare tunnel:
localfit run qwen3:14b --remote kaggle
localfit --remote-status # check active session
localfit --remote-stop # stop + free quota
Remote RunPod (Paid)
Any GPU size. Live pricing from the API. Auto-stop when budget runs out:
localfit login runpod # save API key
localfit run gemma4:27b --cloud # auto-provision + tunnel
localfit --stop # kill pod + stop billing
Make It Fit — Remote Quantization
Can't find the right quant? Create your own and upload to HuggingFace:
localfit makeitfit Qwen2.5-7B-Instruct
Your GPU: Apple Silicon 16GB
Model: Qwen/Qwen2.5-7B-Instruct (14GB BF16)
1 Quantize on Kaggle (free) → Q4_K_M GGUF ~7 min
2 Quantize on RunPod → Q5_K_M GGUF ~$0.10
3 Serve remotely (no quant)
Pick option:
How it works:
- Picks Kaggle GPU (free) or RunPod (cheapest available)
- Downloads model from HuggingFace
- Converts to F16 GGUF via llama.cpp
- Quantizes to your chosen method (Q4_K_M, Q5_K_M, Q8_0, etc.)
- Uploads to your HuggingFace repo
- Run it:
localfit run yourname/model-Q4_K_M-GGUF-localfit
Uses llama.cpp native tools — no Unsloth dependency, works reliably on Kaggle and RunPod.
Launch Any Tool
Start a model and launch your coding tool in one command:
localfit --launch claude # Claude Code
localfit --launch claude --model gemma4:26b
localfit --launch codex # OpenAI Codex CLI
localfit --launch opencode # OpenCode
localfit --launch aider # aider
localfit --launch webui # Open WebUI (ChatGPT-style browser)
localfit --launch webui --tunnel # + public URL via Cloudflare
Works with both local and remote models. Env vars are scoped to the subprocess only — your normal tool setup is never touched.
All Commands
Model Management
localfit run MODEL # interactive menu → pick backend → serve
localfit run MODEL --remote kaggle # serve on free Kaggle GPU
localfit run MODEL --cloud # serve on RunPod (paid)
localfit pull MODEL # download only
localfit list # installed models
localfit ps # running models
localfit stop # stop local server
localfit show MODEL # all quants + fit analysis + pricing
Quantization
localfit makeitfit MODEL # quantize remotely → upload to HuggingFace
localfit login huggingface # save HF write token (for uploads)
GPU & Hardware
localfit # GPU dashboard + trending models
localfit health # GPU VRAM, temp, processes
localfit specs # full machine specs
localfit simulate # interactive "will this model fit?"
localfit bench # benchmark installed models
localfit arena # leaderboard on YOUR hardware
localfit trending # top models with fit/cloud tags
Tool Integration
localfit --launch TOOL # start model + launch tool
localfit --config TOOL # show safe launch command
localfit doctor # check all tool configs
localfit restore # restore configs from backup
Cloud & Remote
localfit login kaggle # save Kaggle credentials
localfit login runpod # save RunPod API key
localfit login huggingface # save HF token
localfit --remote-status # check active Kaggle session
localfit --remote-stop # stop Kaggle session
localfit --stop # stop RunPod pod
System
localfit check # check prerequisites (llama-server, CUDA, etc.)
localfit cleanup # free GPU memory
localfit debloat # disable macOS services stealing GPU
Supported Platforms
| Platform | GPU Detection | Backends |
|---|---|---|
| macOS Apple Silicon | Metal | MLX + llama.cpp + Ollama |
| Linux NVIDIA | CUDA (nvidia-smi) | llama.cpp + Ollama |
| Linux AMD | ROCm (rocm-smi) | llama.cpp + Ollama |
| Windows (WSL2) | CUDA (nvidia-smi) | llama.cpp + Ollama |
Dynamic VRAM Context Sizing
localfit auto-calculates the optimal context window:
| Machine | Model | Context |
|---|---|---|
| M4 Pro 24GB | Gemma 4 26B (12GB) | 32K |
| M4 Pro 24GB | Gemma 4 E4B (4.6GB) | 128K |
| M4 Max 64GB | Gemma 4 26B (12GB) | 128K |
Cloud Setup
Kaggle (Free)
# 1. Get your Legacy API Key at https://www.kaggle.com/settings
# → "Legacy API Credentials" → "Create Legacy API Key" → downloads kaggle.json
# 2. Save it:
localfit login kaggle
# 3. Run any model:
localfit run gemma4:e4b --remote kaggle
RunPod (Paid)
# 1. Get API key at https://www.runpod.io/console/user/settings
# 2. Save it:
localfit login runpod
# 3. Run any model:
localfit run gemma4:27b --cloud
HuggingFace (For Uploads)
# 1. Create a write token at https://huggingface.co/settings/tokens
# 2. Save it:
localfit login huggingface
# 3. Quantize + upload:
localfit makeitfit Qwen2.5-7B-Instruct
Requirements
pip install localfit # core
pip install 'localfit[all]' # + TUI dashboard + HF downloads
pip install mlx-lm # + MLX backend (Mac only)
License
Apache-2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file localfit-1.1.0.tar.gz.
File metadata
- Download URL: localfit-1.1.0.tar.gz
- Upload date:
- Size: 3.6 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e0e9371b8484ccfe470880fa23d615dc5f78a2a59e0e66d621a7ac4b4de48641
|
|
| MD5 |
0e6723baef194ccc35c38fce8736328c
|
|
| BLAKE2b-256 |
d0306f5e3b4565a96b91f9eee282adfe0628dc0a982fa5cb238337a42e40291c
|
File details
Details for the file localfit-1.1.0-py3-none-any.whl.
File metadata
- Download URL: localfit-1.1.0-py3-none-any.whl
- Upload date:
- Size: 149.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bc5f7470b506a83f1de3b603cbecd15a257b67e169ed07240a7f556e17ec2e38
|
|
| MD5 |
bf66eaab26ff3a34ab9c06cb14b0c7f0
|
|
| BLAKE2b-256 |
924323d7ca4af06f3cd40bdfed1822bdb5faafdaf2b5ec3d1f7ce5a11b8fe6c2
|