Fine-tune, experiment with, and run LLMs locally on your Mac
Project description
MLX Forge
Fine-tune LLMs on your Mac with MLX. No cloud, no CUDA required.
MLX Forge is a complete LLM fine-tuning toolkit that runs entirely on your Mac. Pick a model, upload your data, and start training — all from a browser-based UI or CLI. Supports LoRA, DoRA, Full FT, QLoRA, DPO, GRPO, ORPO, KTO, SimPO, 25+ architectures, speculative decoding, vision model training, streaming datasets, GGUF quantized export, and OpenAI-compatible serving.
pip install mlx-forge
mlx-forge studio
Why MLX Forge?
- One command to start —
pip install mlx-forge && mlx-forge studio. - Browser-based Studio UI — Guided training wizard, real-time loss charts, model library with memory estimates, interactive playground, one-click HuggingFace upload.
- 8 training methods — LoRA, DoRA, Full FT, QLoRA, DPO, GRPO, ORPO, KTO, SimPO.
- 25+ model architectures — Llama, Qwen, Gemma, Phi, Mixtral, DeepSeek V2/V3, Mamba, Cohere, and 17 more.
- Speculative decoding — 1.5-2x faster inference with draft models.
- Vision model support — Fine-tune and run VLMs via mlx-vlm integration.
- OpenAI-compatible API —
mlx-forge serveworks with Cursor, Continue.dev, Open WebUI, LangChain, and any OpenAI SDK client. - Runs on Apple Silicon — Built on MLX. Your data stays on your machine. Auto-adjusts memory settings per hardware (M1 8GB through M4 Max 128GB).
- Full ecosystem — HuggingFace datasets (200k+), Hub upload, GGUF quantized export (Q4_0, Q8_0) for Ollama/llama.cpp.
Quick Start
Studio UI (recommended)
mlx-forge studio
# Opens at http://127.0.0.1:8741
Pick a recipe, choose a model, upload your data, and start training — all from the browser.
CLI
# Browse and download a dataset
mlx-forge data catalog
mlx-forge data download alpaca-cleaned --max-samples 5000
# Or import from HuggingFace (200k+ datasets)
mlx-forge data hf-import tatsu-lab/alpaca --max-samples 5000
# Train
mlx-forge train --config train.yaml
# Generate with speculative decoding (1.5-2x faster)
mlx-forge generate --model Qwen/Qwen3-0.6B --draft-model Qwen/Qwen3-0.6B-draft --prompt "Hello"
# Serve with OpenAI-compatible API
mlx-forge serve --model Qwen/Qwen3-0.6B --port 8000
# Export as quantized GGUF for Ollama
mlx-forge export --run-id <id> --format gguf --quantize q4_0
mlx-forge export --run-id <id> --push-to-hub username/my-model
Models are downloaded from Hugging Face on first run and cached locally. All subsequent runs work offline.
Studio UI
- New Training — Guided wizard: pick a recipe (chat, instruction, DPO, writing style), choose a model, configure, and launch
- Model Library — Browse 18+ curated models with memory estimates for your hardware
- Experiments — Compare runs, view loss curves in real time, export and push to Hub
- Datasets — Manage your training data, import from HuggingFace Hub
- Playground — Chat with your fine-tuned models interactively
Supported Architectures
25+ architectures, all using the same interface. Any HF model with a supported model_type works out of the box:
| Architecture | Examples | Notes |
|---|---|---|
| Llama | Llama 2/3/3.1/4, Mistral, CodeLlama | Mistral auto-remaps to Llama |
| Qwen | Qwen 2/2.5/3/3.5 | Qwen3.5 is hybrid DeltaNet+Attention |
| Gemma | Gemma 1/2/3 | Gemma 2/3 auto-remap |
| Phi | Phi-3, Phi-4 | |
| Mixtral | Mixtral 8x7B, 8x22B | Sparse MoE with top-k routing |
| DeepSeek | DeepSeek V2, V3, R1 | MoE + Multi-Latent Attention |
| Cohere | Command R, Command R+ v2 | Parallel attention+MLP, sliding window |
| Mamba | Mamba, Mamba-2 | Pure SSM (no attention) |
| Jamba | Jamba | Hybrid Mamba+Attention+MoE |
| Falcon H1 | Falcon H1 | Hybrid SSM+Attention |
| OLMo | OLMo 2 | AI2 open model |
| InternLM | InternLM 2 | Fused QKV projection |
| StarCoder | StarCoder 2 | Code models with GQA |
| GLM | ChatGLM-4 | RMSNorm + SwiGLU |
| Granite | IBM Granite | Multiplier-based scaling |
| StableLM | StableLM | Partial RoPE |
| OpenELM | Apple OpenELM | Per-layer head scaling |
Features
Training Methods
- LoRA and QLoRA (4-bit) — Low-rank adaptation with 67% memory reduction
- DoRA — Weight-Decomposed Low-Rank Adaptation for better quality
- Full Fine-Tuning — All parameters trainable for small models
- DPO — Direct Preference Optimization for alignment
- GRPO — Group Relative Policy Optimization (DeepSeek-R1 style RL)
- ORPO — Odds Ratio Preference Optimization (no reference model needed)
- KTO — Kahneman-Tversky Optimization (unpaired preference data)
- SimPO — Simple Preference Optimization (length-normalized, no reference model)
Training Features
- Sequence packing for 2-5x speedup on short sequences
- Gradient checkpointing for 40-60% memory savings (auto-enabled when needed)
- Compiled training loop with gradient accumulation
- Cosine, linear, step, and exponential LR schedules with warmup
- Resume from any checkpoint
- Streaming data pipeline for datasets that don't fit in RAM
- Auto memory safety — batch size and checkpointing adjusted per hardware
Inference
- Speculative decoding with draft models for 1.5-2x speedup
- Prompt caching — save/load KV cache state for reuse
- Vision model inference via mlx-vlm integration
Data
- 20+ curated datasets across 7 categories (general, code, math, conversation, reasoning, safety, domain)
- 200k+ HuggingFace datasets via
hf_datasetconfig ormlx-forge data hf-import - Streaming mode for large datasets (
data.streaming: true) - Auto-detection of 8 formats: chat, completions, text, preference, KTO, Alpaca, ShareGPT, Q&A
- Multi-dataset mixing with weighted sampling
- Data validation with train/val overlap detection
Serving & Export
- OpenAI-compatible API server (
/v1/chat/completions,/v1/completions,/v1/models) - GGUF export with quantization:
--quantize q4_0(4x smaller) orq8_0(2x smaller) - One-command HuggingFace Hub upload with auto-generated model cards
- Vision model serving (image+text input via OpenAI message format)
CLI Reference
| Command | Description |
|---|---|
mlx-forge studio |
Launch the Studio UI |
mlx-forge train --config FILE |
Run training (SFT/DPO/GRPO/ORPO/KTO/SimPO) |
mlx-forge generate --model MODEL |
Generate text or interactive chat |
mlx-forge generate --model M --draft-model D |
Speculative decoding (1.5-2x faster) |
mlx-forge serve --model MODEL |
Start OpenAI-compatible API server |
mlx-forge export --run-id ID --format gguf --quantize q4_0 |
Export quantized GGUF |
mlx-forge export --run-id ID --push-to-hub USER/REPO |
Upload to HuggingFace Hub |
mlx-forge prepare --data FILE --model MODEL |
Pre-tokenize a dataset |
mlx-forge data catalog |
Browse 20+ curated datasets |
mlx-forge data download DATASET |
Download a dataset from the catalog |
mlx-forge data hf-import DATASET |
Import from HuggingFace Hub |
mlx-forge data import FILE --name NAME |
Import a local JSONL file |
mlx-forge data validate FILE |
Validate JSONL data |
Configuration
schema_version: 1
model:
path: "Qwen/Qwen3-0.6B" # HF model ID or local path
quantization: # Optional: QLoRA (67% memory savings)
bits: 4
group_size: 64
# vision: true # Enable vision model support
adapter:
method: lora # lora | dora | full
preset: "attention-qv" # attention-qv | attention-all | mlp | all-linear
rank: 16
scale: 32.0
data:
train: "./train.jsonl"
valid: "./val.jsonl"
# OR: hf_dataset: "tatsu-lab/alpaca" # Load from HuggingFace
# streaming: true # Stream large datasets
packing: false # Sequence packing (2-5x speedup)
max_seq_length: 2048
training:
training_type: sft # sft | dpo | grpo | orpo | kto | simpo
optimizer: adamw # adam | adamw | sgd | adafactor
learning_rate: 1.0e-5
num_iters: 1000
batch_size: 4
gradient_checkpointing: false # 40-60% memory savings (auto-enabled if needed)
runtime:
seed: 42
OpenAI-Compatible API
mlx-forge serve --model Qwen/Qwen3-0.6B --port 8000
Works with any OpenAI SDK client:
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="not-needed")
response = client.chat.completions.create(
model="Qwen/Qwen3-0.6B",
messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)
Data Formats
MLX Forge auto-detects JSONL formats:
Chat — Multi-turn conversations (loss on assistant turns only):
{"messages": [{"role": "user", "content": "Hello"}, {"role": "assistant", "content": "Hi!"}]}
Completions — Prompt-completion pairs:
{"prompt": "Translate to French: Hello", "completion": "Bonjour"}
Text — Raw text for continued pretraining:
{"text": "The quick brown fox jumps over the lazy dog."}
Preference — For DPO/ORPO/SimPO alignment training:
{"chosen": [{"role": "user", "content": "..."}, {"role": "assistant", "content": "good"}], "rejected": [{"role": "user", "content": "..."}, {"role": "assistant", "content": "bad"}]}
KTO — Unpaired preference data (desirable/undesirable):
{"text": "A helpful response about Python.", "label": 1}
{"text": "An unhelpful or harmful response.", "label": 0}
Library API
All CLI commands are backed by Python functions:
from mlx_forge import prepare, train
from mlx_forge.config import TrainingConfig
# Train from a config file
config = TrainingConfig.from_yaml("train.yaml")
result = train(config=config)
print(f"Best val loss: {result.best_val_loss:.4f}")
from mlx_forge import generate
# Generate text with a fine-tuned adapter
generate(
model="Qwen/Qwen3-0.6B",
adapter="~/.mlxforge/runs/my-run/checkpoints/best",
prompt="Explain quantum computing in simple terms.",
)
Contributing
See CONTRIBUTING.md for development setup, coding standards, and how to submit changes.
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mlx_forge-0.8.0.tar.gz.
File metadata
- Download URL: mlx_forge-0.8.0.tar.gz
- Upload date:
- Size: 578.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d435af7cd4f9dfe3a3eff7a7ff8c60707b93da23d9fffb1139d12878ffac6c98
|
|
| MD5 |
47e9725603596d629774fb4eca384f9b
|
|
| BLAKE2b-256 |
cef1644ec812b061f1babc43188cf3a367124be9545c2b3429b6f59a9e6ee194
|
File details
Details for the file mlx_forge-0.8.0-py3-none-any.whl.
File metadata
- Download URL: mlx_forge-0.8.0-py3-none-any.whl
- Upload date:
- Size: 546.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4fd9512b0274e64b5d939d0dd57d6691025b38b4cd088d71a6a0639f0c7bb635
|
|
| MD5 |
4556ac4533680a692165934f264e96ec
|
|
| BLAKE2b-256 |
b3f351bccfaf2e9f10decef0f197a24da51a2f7c8451c6e876dea2613e67d913
|