Will it fit? GPU toolkit for AI models — MLX + GGUF + Cloud in one menu.

These details have not been verified by PyPI

Project links

Project description

localfit

██╗      ██████╗  ██████╗ █████╗ ██╗     ███████╗██╗████████╗
██║     ██╔═══██╗██╔════╝██╔══██╗██║     ██╔════╝██║╚══██╔══╝
██║     ██║   ██║██║     ███████║██║     █████╗  ██║   ██║
██║     ██║   ██║██║     ██╔══██║██║     ██╔══╝  ██║   ██║
███████╗╚██████╔╝╚██████╗██║  ██║███████╗██║     ██║   ██║
╚══════╝ ╚═════╝  ╚═════╝╚═╝  ╚═╝╚══════╝╚═╝     ╚═╝   ╚═╝

Will it fit? Say what model you want — localfit figures out the rest.

Text or image generation. Fits locally? Run it via MLX or llama.cpp. Doesn't fit? Kaggle free GPU. Still too big? RunPod cloud. Need a custom quant? Quantize remotely and upload to HuggingFace. You never think about hardware.

pip install localfit

Quick Start

localfit                              # GPU dashboard + trending models
localfit run gemma4:e4b               # interactive menu: pick MLX, GGUF, or Cloud
localfit run qwen3:14b                # doesn't fit? menu shows Kaggle/RunPod options
localfit launch openwebui --model gemma4:e4b                    # serve + launch tool
localfit launch openwebui --model gemma4:e4b --remote kaggle    # serve on free Kaggle GPU + launch
localfit launch claude --model gemma4:26b --remote runpod --budget $2
localfit makeitfit llama-4-scout      # quantize remotely → upload to your HuggingFace

The Run Menu

When you localfit run MODEL, you get an interactive menu with arrow key navigation — pick your backend before anything downloads:

╭──────────────────────── Qwen2.5-7B-Instruct ─────────────────────────╮
│   LOCAL                                                              │
│   › MLX   Qwen2.5-7B-Instruct-4bit             3.5GB                │
│     MLX   Qwen2.5-7B-Instruct-8bit             7.0GB  ⭐             │
│     GGUF  Q4_K_M                                4.4GB                │
│     GGUF  Q8_0                                  7.5GB                │
│                                                                      │
│   REMOTE                                                             │
│     Kaggle  T4 16GB                               free               │
│     RunPod  RTX A5000 24GB                    $0.16/hr               │
╰──────────────────────────── Apple Silicon 16GB ──────────────────────╯

MLX — Native Apple Silicon, fastest on Mac. Auto-discovers mlx-community models.
GGUF — llama.cpp with Metal/CUDA. Works everywhere.
Kaggle — Free 30h/week GPU. One click to deploy.
RunPod — Paid cloud GPU. Auto-picks cheapest that fits.

Backends

MLX (Apple Silicon)

localfit auto-detects if you have mlx-lm installed and finds MLX models on HuggingFace:

pip install mlx-lm                    # one-time setup
localfit run gemma-3-4b-it            # auto-picks mlx-community/gemma-3-4b-it-8bit

If no mlx-community model exists, localfit can convert any HuggingFace model to MLX locally:

localfit run bytedance-research/UI-TARS-7B-DPO
# → No mlx-community model found
# → "Convert locally? mlx_lm needs ~14GB RAM (you have 24GB)"
# → Converts to MLX 4-bit → serves immediately

GGUF (llama.cpp)

The default for all platforms. localfit downloads the best GGUF quant for your GPU and serves via llama-server:

localfit run gemma4:26b               # MoE, 12GB, best quality on 24GB Mac
localfit show gemma4:26b              # show all quants + fit analysis + cloud pricing

Remote Kaggle (Free)

30 hours/week of free T4 GPU. localfit auto-deploys via Cloudflare tunnel:

localfit run qwen3:14b --remote kaggle
localfit --remote-status              # check active session
localfit --remote-stop                # stop + free quota

Remote RunPod (Paid)

Any GPU size. Live pricing from the API. Auto-stop when budget runs out:

localfit login runpod                 # save API key
localfit run gemma4:27b --cloud       # auto-provision + tunnel
localfit --stop                       # kill pod + stop billing

Image Generation

localfit serves image generation models via an OpenAI-compatible API — same endpoint as DALL-E. Works with Open WebUI, Claude Code, or any OpenAI client.

python -m localfit.image_server                          # default: schnell, 4-bit
python -m localfit.image_server 8189 z-image-turbo 4     # Z-Image-Turbo, 4-bit
python -m localfit.image_server 8189 flux2-klein-4b 4    # Flux2 Klein 4B, 4-bit

API:

curl http://127.0.0.1:8189/v1/images/generations \
  -H "Content-Type: application/json" \
  -d '{"prompt": "a cyberpunk city at sunset", "size": "512x512", "steps": 4}'

Open WebUI: Settings → Images → OpenAI URL → http://127.0.0.1:8189

Supported Image Models

Model	Pipeline	Params	Local Mac	Kaggle T4	RunPod
FLUX.2 Klein 4B	Flux2KleinPipeline	4B	mflux 24s	65s	2s (3090)
FLUX.2 Klein 9B	Flux2KleinPipeline	9B	mflux	cpu_offload	3090+
FLUX.1 Schnell	FluxPipeline	12B	mflux 79s	cpu_offload	3090
FLUX.1 Dev	FluxPipeline	12B	mflux	cpu_offload	3090+
FLUX.2 Dev	Flux2Pipeline	32B	—	—	A100
Z-Image-Turbo	ZImagePipeline	6B	mflux 90s	T4	3090
Z-Image	ZImagePipeline	6B	mflux	T4	3090
Qwen-Image	QwenImagePipeline	20B	—	—	A6000+
Qwen-Image-Edit	QwenImageEditPlusPipeline	20B	—	—	A100
SDXL	StableDiffusionXLPipeline	6.6B	diffusers	T4	3090
SDXL-Turbo	AutoPipelineForText2Image	6.6B	diffusers	T4	3090
SD 3.5 Large	StableDiffusion3Pipeline	8B	diffusers	T4	3090

All models use DiffusionPipeline.from_pretrained() which auto-detects the correct pipeline class.

Remote (Kaggle free GPU): Image models that don't fit locally can run on Kaggle's free T4 GPU via diffusers + Cloudflare tunnel — same flow as LLM remote serving:

localfit serve schnell --remote kaggle --image        # FLUX.1 Schnell on free T4
localfit serve flux-dev --remote kaggle --image       # FLUX.1 Dev (higher quality)
localfit serve flux2-klein-9b --remote kaggle --image # FLUX.2 Klein 9B

The remote API is OpenAI-compatible — Open WebUI, Claude Code, or any client can connect to the tunnel URL.

MCP Server for Claude Code

localfit includes an MCP server that gives Claude Code (and other MCP clients) image generation and editing tools:

// ~/.claude/settings.json
{
  "mcpServers": {
    "localfit-image": {
      "command": "python3",
      "args": ["-m", "localfit.mcp_image"]
    }
  }
}

Remote endpoint (Kaggle/RunPod tunnel):

{
  "mcpServers": {
    "localfit-image": {
      "command": "python3",
      "args": ["-m", "localfit.mcp_image", "--endpoint", "https://xxx.trycloudflare.com"]
    }
  }
}

Available MCP tools:

Tool	Description
`generate_image`	Text-to-image generation (prompt, size, steps, seed)
`edit_image`	Image-to-image editing (source image + prompt + strength)
`list_image_models`	List loaded models on the server
`image_server_status`	Check server health and model info

Start the image server first, then Claude Code can generate images:

python -m localfit.image_server                  # start local server
# Claude Code can now use generate_image tool

Image Generation Benchmarks

Platform	GPU	Model	Resolution	Steps	Time
Mac local	M4 Pro (mflux)	Klein 4B	1024x1024	4	24s
Mac local	M4 Pro (mflux)	Schnell	512x512	4	79s
Mac local	M4 Pro (mflux)	Z-Image-Turbo	512x512	9	3.5min
Kaggle	T4 16GB	Klein 4B	512x512	4	65s
RunPod	RTX 3090 24GB	Klein 4B	512x512	4	2s

First run downloads model weights (~8-23GB). Subsequent runs load from cache.

Make It Fit — Remote Quantization

Can't find the right quant? Create your own and upload to HuggingFace:

localfit makeitfit Qwen2.5-7B-Instruct

  Your GPU: Apple Silicon 16GB
  Model: Qwen/Qwen2.5-7B-Instruct (14GB BF16)

  1  Quantize on Kaggle (free) → Q4_K_M GGUF     ~7 min
  2  Quantize on RunPod         → Q5_K_M GGUF     ~$0.10
  3  Serve remotely (no quant)

  Pick option:

How it works:

Picks Kaggle GPU (free) or RunPod (cheapest available)
Downloads model from HuggingFace
Converts to F16 GGUF via llama.cpp
Quantizes to your chosen method (Q4_K_M, Q5_K_M, Q8_0, etc.)
Uploads to your HuggingFace repo
Run it: localfit run yourname/model-Q4_K_M-GGUF-localfit

Uses llama.cpp native tools — no Unsloth dependency, works reliably on Kaggle and RunPod.

Launch Any Tool — Local or Remote

One command: pick a model, serve it (locally or cloud), launch your tool connected to it.

# Local (model fits your GPU)
localfit launch openwebui --model gemma4:e4b
localfit launch claude --model gemma4:26b
localfit launch codex --model qwen3:8b
localfit launch opencode --model gemma4:e4b
localfit launch aider --model gemma4:26b

# Remote Kaggle (free 30h/week GPU)
localfit launch openwebui --model gemma4:e4b --remote kaggle --budget 1h
localfit launch claude --model gemma4:31b --remote kaggle --budget 2h

# Remote RunPod (paid, any GPU)
localfit launch openwebui --model gemma4:31b --remote runpod --budget $2
localfit launch claude --model llama3:70b --remote runpod --budget $5

Budget: 30m, 1h, 2h (time) or $1, $2, $5 (money → auto-calculates time on cheapest GPU).

Shows remaining quota/balance before launch:

  Kaggle GPU quota: 17h remaining (of 30h/week)
  Duration: 60min
  
  ✓ Endpoint ready: https://xxx.trycloudflare.com
  ✓ Open WebUI launched: http://localhost:8080

Supported Tools

Tool	Command
Open WebUI	`localfit launch openwebui`
Claude Code	`localfit launch claude`
OpenAI Codex	`localfit launch codex`
OpenCode	`localfit launch opencode`
aider	`localfit launch aider`
Open WebUI + tunnel	`localfit launch webui --tunnel`

Works with both local and remote models. Env vars are scoped to the subprocess only — your normal tool setup is never touched.

All Commands

Model Management

localfit run MODEL                    # interactive menu → pick backend → serve
localfit run MODEL --remote kaggle    # serve on free Kaggle GPU
localfit run MODEL --cloud            # serve on RunPod (paid)
localfit pull MODEL                   # download only
localfit list                         # installed models
localfit ps                           # running models
localfit stop                         # stop local server
localfit show MODEL                   # all quants + fit analysis + pricing

Quantization

localfit makeitfit MODEL              # quantize remotely → upload to HuggingFace
localfit login huggingface            # save HF write token (for uploads)

GPU & Hardware

localfit                              # GPU dashboard + trending models
localfit health                       # GPU VRAM, temp, processes
localfit specs                        # full machine specs
localfit simulate                     # interactive "will this model fit?"
localfit bench                        # benchmark installed models
localfit arena                        # leaderboard on YOUR hardware
localfit trending                     # top models with fit/cloud tags

Tool Integration

localfit --launch TOOL                # start model + launch tool
localfit --config TOOL                # show safe launch command
localfit doctor                       # check all tool configs
localfit restore                      # restore configs from backup

Cloud & Remote

localfit login kaggle                 # save Kaggle credentials
localfit login runpod                 # save RunPod API key
localfit login huggingface            # save HF token
localfit --remote-status              # check active Kaggle session
localfit --remote-stop                # stop Kaggle session
localfit --stop                       # stop RunPod pod

System

localfit check                        # check prerequisites (llama-server, CUDA, etc.)
localfit cleanup                      # free GPU memory
localfit debloat                      # disable macOS services stealing GPU

Supported Platforms

Platform	GPU Detection	LLM Backends	Image Gen
macOS Apple Silicon	Metal	MLX + llama.cpp + Ollama	mflux (MLX native)
Linux NVIDIA	CUDA (nvidia-smi)	llama.cpp + Ollama	diffusers (CUDA)
Linux AMD	ROCm (rocm-smi)	llama.cpp + Ollama	diffusers (ROCm)
Windows (WSL2)	CUDA (nvidia-smi)	llama.cpp + Ollama	diffusers (CUDA)

Dynamic VRAM Context Sizing

localfit auto-calculates the optimal context window:

Machine	Model	Context
M4 Pro 24GB	Gemma 4 26B (12GB)	32K
M4 Pro 24GB	Gemma 4 E4B (4.6GB)	128K
M4 Max 64GB	Gemma 4 26B (12GB)	128K

Cloud Setup

Kaggle (Free)

# 1. Get your Legacy API Key at https://www.kaggle.com/settings
#    → "Legacy API Credentials" → "Create Legacy API Key" → downloads kaggle.json
# 2. Save it:
localfit login kaggle
# 3. Run any model:
localfit run gemma4:e4b --remote kaggle

RunPod (Paid)

# 1. Get API key at https://www.runpod.io/console/user/settings
# 2. Save it:
localfit login runpod
# 3. Run any model:
localfit run gemma4:27b --cloud

HuggingFace (For Uploads)

# 1. Create a write token at https://huggingface.co/settings/tokens
# 2. Save it:
localfit login huggingface
# 3. Quantize + upload:
localfit makeitfit Qwen2.5-7B-Instruct

Requirements

Python 3.10+
llama.cpp or Ollama (auto-installed)
Optional: mlx-lm for Apple Silicon MLX backend
Optional: mflux for image generation on Mac
Optional: diffusers for image generation on Linux

pip install localfit                  # core
pip install 'localfit[all]'           # + TUI dashboard + HF downloads
pip install mlx-lm                    # + MLX backend (Mac only)
pip install mflux                     # + image generation (Mac only)
pip install diffusers torch           # + image generation (Linux/Windows)

License

Apache-2.0

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.7.0

Apr 12, 2026

1.6.3

Apr 10, 2026

1.6.2

Apr 10, 2026

1.6.1

Apr 10, 2026

1.6.0

Apr 10, 2026

1.5.6

Apr 10, 2026

1.5.5

Apr 10, 2026

1.5.4

Apr 10, 2026

1.5.3

Apr 10, 2026

1.5.2

Apr 10, 2026

1.5.1

Apr 10, 2026

1.5.0

Apr 10, 2026

1.4.6

Apr 10, 2026

1.4.5

Apr 10, 2026

1.4.4

Apr 10, 2026

1.4.3

Apr 10, 2026

1.4.2

Apr 10, 2026

1.4.1

Apr 10, 2026

1.4.0

Apr 10, 2026

1.3.9

Apr 10, 2026

1.3.8

Apr 10, 2026

1.3.7

Apr 10, 2026

1.3.6

Apr 10, 2026

1.3.5

Apr 10, 2026

1.3.4

Apr 10, 2026

1.3.3

Apr 10, 2026

1.3.2

Apr 10, 2026

1.3.1

Apr 10, 2026

1.3.0

Apr 10, 2026

1.2.9

Apr 10, 2026

1.2.7

Apr 10, 2026

1.2.6

Apr 10, 2026

1.2.5

Apr 10, 2026

1.2.4

Apr 10, 2026

1.2.3

Apr 10, 2026

1.2.2

Apr 10, 2026

1.2.1

Apr 10, 2026

1.2.0

Apr 10, 2026

1.1.3

Apr 10, 2026

1.1.2

Apr 10, 2026

1.1.1

Apr 10, 2026

1.1.0

Apr 10, 2026

1.0.9

Apr 10, 2026

1.0.8

Apr 10, 2026

1.0.7

Apr 10, 2026

1.0.6

Apr 10, 2026

1.0.5

Apr 10, 2026

1.0.4

Apr 10, 2026

1.0.3

Apr 10, 2026

1.0.2

Apr 10, 2026

1.0.1

Apr 10, 2026

1.0.0

Apr 10, 2026

0.8.0

Apr 10, 2026

0.7.1

Apr 10, 2026

0.6.0

Apr 9, 2026

0.5.0

Apr 8, 2026

0.4.0

Apr 7, 2026

0.3.1

Apr 7, 2026

0.3.0

Apr 6, 2026

0.1.0

Apr 6, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

localfit-1.7.0.tar.gz (2.1 MB view details)

Uploaded Apr 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

localfit-1.7.0-py3-none-any.whl (176.4 kB view details)

Uploaded Apr 12, 2026 Python 3

File details

Details for the file localfit-1.7.0.tar.gz.

File metadata

Download URL: localfit-1.7.0.tar.gz
Upload date: Apr 12, 2026
Size: 2.1 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for localfit-1.7.0.tar.gz
Algorithm	Hash digest
SHA256	`34c89051844aae5755d641e2f92270519c230cacb10bbc58cde82496dbee1312`
MD5	`796b2934e4300a1ff24649deb426d1a5`
BLAKE2b-256	`fc1366bc7adaf058419813d36e008d744f05f8056068582855c6d9838dbc0a21`

See more details on using hashes here.

File details

Details for the file localfit-1.7.0-py3-none-any.whl.

File metadata

Download URL: localfit-1.7.0-py3-none-any.whl
Upload date: Apr 12, 2026
Size: 176.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for localfit-1.7.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0c7cd779653a5aeee9dc5ed020b321901c996fcec817a152e8c423a5862a1c02`
MD5	`49b877d1a2b9049a2613f14774daf09c`
BLAKE2b-256	`00b6b1d3936c38f1bf60c54012967058a7286f782122e7d6a755abfe3e8ab663`

See more details on using hashes here.

localfit 1.7.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

localfit

Quick Start

The Run Menu

Backends

MLX (Apple Silicon)

GGUF (llama.cpp)

Remote Kaggle (Free)

Remote RunPod (Paid)

Image Generation

Supported Image Models

MCP Server for Claude Code

Image Generation Benchmarks

Make It Fit — Remote Quantization

Launch Any Tool — Local or Remote

Supported Tools

All Commands

Model Management

Quantization

GPU & Hardware

Tool Integration

Cloud & Remote

System

Supported Platforms

Dynamic VRAM Context Sizing

Cloud Setup

Kaggle (Free)

RunPod (Paid)

HuggingFace (For Uploads)

Requirements

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes