Skip to main content

Will it fit? GPU toolkit for AI models — MLX + GGUF + Cloud in one menu.

Project description

localfit

██╗      ██████╗  ██████╗ █████╗ ██╗     ███████╗██╗████████╗
██║     ██╔═══██╗██╔════╝██╔══██╗██║     ██╔════╝██║╚══██╔══╝
██║     ██║   ██║██║     ███████║██║     █████╗  ██║   ██║
██║     ██║   ██║██║     ██╔══██║██║     ██╔══╝  ██║   ██║
███████╗╚██████╔╝╚██████╗██║  ██║███████╗██║     ██║   ██║
╚══════╝ ╚═════╝  ╚═════╝╚═╝  ╚═╝╚══════╝╚═╝     ╚═╝   ╚═╝

Will it fit? Say what model you want — localfit figures out the rest.

Fits locally? Run it via MLX or llama.cpp. Doesn't fit? Kaggle free GPU. Still too big? RunPod cloud. Need a custom quant? Quantize remotely and upload to HuggingFace. You never think about hardware.

pip install localfit

Quick Start

localfit                              # GPU dashboard + trending models
localfit run gemma4:e4b               # interactive menu: pick MLX, GGUF, or Cloud
localfit run qwen3:14b                # doesn't fit? menu shows Kaggle/RunPod options
localfit --launch claude              # start model + launch Claude Code
localfit makeitfit llama-4-scout      # quantize remotely → upload to your HuggingFace

The Run Menu

When you localfit run MODEL, you get an interactive menu with arrow key navigation — pick your backend before anything downloads:

╭──────────────────────── Qwen2.5-7B-Instruct ─────────────────────────╮
│   LOCAL                                                              │
│   › MLX   Qwen2.5-7B-Instruct-4bit             3.5GB                │
│     MLX   Qwen2.5-7B-Instruct-8bit             7.0GB  ⭐             │
│     GGUF  Q4_K_M                                4.4GB                │
│     GGUF  Q8_0                                  7.5GB                │
│                                                                      │
│   REMOTE                                                             │
│     Kaggle  T4 16GB                               free               │
│     RunPod  RTX A5000 24GB                    $0.16/hr               │
╰──────────────────────────── Apple Silicon 16GB ──────────────────────╯
  • MLX — Native Apple Silicon, fastest on Mac. Auto-discovers mlx-community models.
  • GGUF — llama.cpp with Metal/CUDA. Works everywhere.
  • Kaggle — Free 30h/week GPU. One click to deploy.
  • RunPod — Paid cloud GPU. Auto-picks cheapest that fits.

Backends

MLX (Apple Silicon)

localfit auto-detects if you have mlx-lm installed and finds MLX models on HuggingFace:

pip install mlx-lm                    # one-time setup
localfit run gemma-3-4b-it            # auto-picks mlx-community/gemma-3-4b-it-8bit

If no mlx-community model exists, localfit can convert any HuggingFace model to MLX locally:

localfit run bytedance-research/UI-TARS-7B-DPO
# → No mlx-community model found
# → "Convert locally? mlx_lm needs ~14GB RAM (you have 24GB)"
# → Converts to MLX 4-bit → serves immediately

GGUF (llama.cpp)

The default for all platforms. localfit downloads the best GGUF quant for your GPU and serves via llama-server:

localfit run gemma4:26b               # MoE, 12GB, best quality on 24GB Mac
localfit show gemma4:26b              # show all quants + fit analysis + cloud pricing

Remote Kaggle (Free)

30 hours/week of free T4 GPU. localfit auto-deploys via Cloudflare tunnel:

localfit run qwen3:14b --remote kaggle
localfit --remote-status              # check active session
localfit --remote-stop                # stop + free quota

Remote RunPod (Paid)

Any GPU size. Live pricing from the API. Auto-stop when budget runs out:

localfit login runpod                 # save API key
localfit run gemma4:27b --cloud       # auto-provision + tunnel
localfit --stop                       # kill pod + stop billing

Make It Fit — Remote Quantization

Can't find the right quant? Create your own and upload to HuggingFace:

localfit makeitfit Qwen2.5-7B-Instruct
  Your GPU: Apple Silicon 16GB
  Model: Qwen/Qwen2.5-7B-Instruct (14GB BF16)

  1  Quantize on Kaggle (free) → Q4_K_M GGUF     ~7 min
  2  Quantize on RunPod         → Q5_K_M GGUF     ~$0.10
  3  Serve remotely (no quant)

  Pick option:

How it works:

  1. Picks Kaggle GPU (free) or RunPod (cheapest available)
  2. Downloads model from HuggingFace
  3. Converts to F16 GGUF via llama.cpp
  4. Quantizes to your chosen method (Q4_K_M, Q5_K_M, Q8_0, etc.)
  5. Uploads to your HuggingFace repo
  6. Run it: localfit run yourname/model-Q4_K_M-GGUF-localfit

Uses llama.cpp native tools — no Unsloth dependency, works reliably on Kaggle and RunPod.

Launch Any Tool

Start a model and launch your coding tool in one command:

localfit --launch claude              # Claude Code
localfit --launch claude --model gemma4:26b
localfit --launch codex               # OpenAI Codex CLI
localfit --launch opencode            # OpenCode
localfit --launch aider               # aider
localfit --launch webui               # Open WebUI (ChatGPT-style browser)
localfit --launch webui --tunnel      # + public URL via Cloudflare

Works with both local and remote models. Env vars are scoped to the subprocess only — your normal tool setup is never touched.

All Commands

Model Management

localfit run MODEL                    # interactive menu → pick backend → serve
localfit run MODEL --remote kaggle    # serve on free Kaggle GPU
localfit run MODEL --cloud            # serve on RunPod (paid)
localfit pull MODEL                   # download only
localfit list                         # installed models
localfit ps                           # running models
localfit stop                         # stop local server
localfit show MODEL                   # all quants + fit analysis + pricing

Quantization

localfit makeitfit MODEL              # quantize remotely → upload to HuggingFace
localfit login huggingface            # save HF write token (for uploads)

GPU & Hardware

localfit                              # GPU dashboard + trending models
localfit health                       # GPU VRAM, temp, processes
localfit specs                        # full machine specs
localfit simulate                     # interactive "will this model fit?"
localfit bench                        # benchmark installed models
localfit arena                        # leaderboard on YOUR hardware
localfit trending                     # top models with fit/cloud tags

Tool Integration

localfit --launch TOOL                # start model + launch tool
localfit --config TOOL                # show safe launch command
localfit doctor                       # check all tool configs
localfit restore                      # restore configs from backup

Cloud & Remote

localfit login kaggle                 # save Kaggle credentials
localfit login runpod                 # save RunPod API key
localfit login huggingface            # save HF token
localfit --remote-status              # check active Kaggle session
localfit --remote-stop                # stop Kaggle session
localfit --stop                       # stop RunPod pod

System

localfit check                        # check prerequisites (llama-server, CUDA, etc.)
localfit cleanup                      # free GPU memory
localfit debloat                      # disable macOS services stealing GPU

Supported Platforms

Platform GPU Detection Backends
macOS Apple Silicon Metal MLX + llama.cpp + Ollama
Linux NVIDIA CUDA (nvidia-smi) llama.cpp + Ollama
Linux AMD ROCm (rocm-smi) llama.cpp + Ollama
Windows (WSL2) CUDA (nvidia-smi) llama.cpp + Ollama

Dynamic VRAM Context Sizing

localfit auto-calculates the optimal context window:

Machine Model Context
M4 Pro 24GB Gemma 4 26B (12GB) 32K
M4 Pro 24GB Gemma 4 E4B (4.6GB) 128K
M4 Max 64GB Gemma 4 26B (12GB) 128K

Cloud Setup

Kaggle (Free)

# 1. Get your Legacy API Key at https://www.kaggle.com/settings
#    → "Legacy API Credentials" → "Create Legacy API Key" → downloads kaggle.json
# 2. Save it:
localfit login kaggle
# 3. Run any model:
localfit run gemma4:e4b --remote kaggle

RunPod (Paid)

# 1. Get API key at https://www.runpod.io/console/user/settings
# 2. Save it:
localfit login runpod
# 3. Run any model:
localfit run gemma4:27b --cloud

HuggingFace (For Uploads)

# 1. Create a write token at https://huggingface.co/settings/tokens
# 2. Save it:
localfit login huggingface
# 3. Quantize + upload:
localfit makeitfit Qwen2.5-7B-Instruct

Requirements

pip install localfit                  # core
pip install 'localfit[all]'           # + TUI dashboard + HF downloads
pip install mlx-lm                    # + MLX backend (Mac only)

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

localfit-1.0.3.tar.gz (3.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

localfit-1.0.3-py3-none-any.whl (149.3 kB view details)

Uploaded Python 3

File details

Details for the file localfit-1.0.3.tar.gz.

File metadata

  • Download URL: localfit-1.0.3.tar.gz
  • Upload date:
  • Size: 3.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for localfit-1.0.3.tar.gz
Algorithm Hash digest
SHA256 2dfd0f7fc6d82b31181a73d3a8bf5220a3e9584c5c39458481894858d2793a8c
MD5 c20b473f8558edb4edb4dbca060e9495
BLAKE2b-256 addff63a58860c44eb857cf8f043ed27c154bc37891c10fcd5f51068bd744774

See more details on using hashes here.

File details

Details for the file localfit-1.0.3-py3-none-any.whl.

File metadata

  • Download URL: localfit-1.0.3-py3-none-any.whl
  • Upload date:
  • Size: 149.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for localfit-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 f6797fe8754796515708a32b70e7ed6d1a2dc7a255c6d6a6766e3dc57d2c8fd2
MD5 a327f98511d554c2a6b31b7d0dc3f660
BLAKE2b-256 c3d183128e8c234c02d3f69879688ae0d2cd903f6841b235396adec48d6e3531

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page