MLX-powered LLM/VLM/TTS/STT/Embedding/OCR fine-tuning for Apple Silicon - Unsloth-compatible API for Mac
Project description
Fine-tune LLMs, Vision, Audio, and OCR models on your Mac
SFT, DPO, GRPO, Vision, TTS, STT, Embedding, and OCR fine-tuning — natively on MLX. Unsloth-compatible API.
Documentation · Quick Start · Training Methods · Examples · Status
[!NOTE] Name Change: This project was originally called
unsloth-mlx. Since it's not an official Unsloth project and to avoid any confusion, it has been renamed tomlx-tune. The vision remains the same — bringing the Unsloth experience to Mac users via MLX. If you were usingunsloth-mlx, simply switch topip install mlx-tuneand update your imports fromunsloth_mlxtomlx_tune.
[!NOTE] Why I Built This (A Personal Note)
I rely on Unsloth for my daily fine-tuning on cloud GPUs—it's the gold standard for me. But recently, I started working on a MacBook M4 and hit a friction point: I wanted to prototype locally on my Mac, then scale up to the cloud without rewriting my entire training script.
Since Unsloth relies on Triton (which Macs don't have, yet), I couldn't use it locally. I built
mlx-tuneto solve this specific "Context Switch" problem. It wraps Apple's native MLX framework in an Unsloth-compatible API.The goal isn't to replace Unsloth or claim superior performance. The goal is code portability: allowing you to write
FastLanguageModelcode once on your Mac, test it, and then push that exact same script to a CUDA cluster. It solves a workflow problem, not just a hardware one.This is an "unofficial" project built by a fan, for fans who happen to use Macs. It's helping me personally, and if it helps others like me, then I'll have my satisfaction.
Why MLX-Tune?
Bringing the Unsloth experience to Mac users via Apple's MLX framework.
- 🚀 Fine-tune LLMs, VLMs, TTS, STT & Embeddings locally on your Mac (M1/M2/M3/M4/M5)
- 💾 Leverage unified memory (up to 512GB on Mac Studio)
- 🔄 Unsloth-compatible API - your existing training scripts just work!
- 📦 Export anywhere - HuggingFace format, GGUF for Ollama/llama.cpp
- 🎙️ Audio fine-tuning - 5 TTS models (Orpheus, OuteTTS, Spark, Sesame, Qwen3-TTS) + 5 STT models (Whisper, Moonshine, Qwen3-ASR, NVIDIA Canary, Voxtral)
# Unsloth (CUDA) # MLX-Tune (Apple Silicon)
from unsloth import FastLanguageModel from mlx_tune import FastLanguageModel
from trl import SFTTrainer from mlx_tune import SFTTrainer
# Rest of your code stays exactly the same!
What This Is (and Isn't)
This is NOT a replacement for Unsloth or an attempt to compete with it. Unsloth is incredible - it's the gold standard for efficient LLM fine-tuning on CUDA.
This IS a bridge for Mac users who want to:
- 🧪 Prototype locally - Experiment with fine-tuning before committing to cloud GPU costs
- 📚 Learn & iterate - Develop your training pipeline with fast local feedback loops
- 🔄 Then scale up - Move to cloud NVIDIA GPUs + original Unsloth for production training
Local Mac (MLX-Tune) → Cloud GPU (Unsloth)
Prototype & experiment Full-scale training
Small datasets Large datasets
Quick iterations Production runs
Project Status
🚀 v0.4.17 - Batched RL training; OCR fine-tuning (DeepSeek-OCR, GLM-OCR, Qwen-VL + CER/WER metrics)
| Feature | Status | Notes |
|---|---|---|
| SFT Training | ✅ Stable | Native MLX training |
| Model Loading | ✅ Stable | Any HuggingFace model (quantized & non-quantized) |
| Save/Export | ✅ Stable | HF format, GGUF (see limitations) |
| DPO Training | ✅ Stable | Full DPO loss |
| ORPO Training | ✅ Stable | Full ORPO loss |
| GRPO Training | ✅ Stable | Multi-generation + reward |
| KTO Training | ✅ Stable | Binary feedback + KTOConfig |
| SimPO Training | ✅ Stable | No ref model + SimPOConfig |
| Chat Templates | ✅ Stable | 15 models (llama, gemma, qwen, phi, mistral) |
| Response-Only Training | ✅ Stable | train_on_responses_only() |
| Multi-turn Merging | ✅ Stable | to_sharegpt() + conversation_extension |
| Column Mapping | ✅ Stable | apply_column_mapping() auto-rename |
| Dataset Config | ✅ Stable | HFDatasetConfig structured loading |
| Vision Models | ✅ Stable | Full VLM fine-tuning via mlx-vlm |
| MoE Fine-Tuning | ✅ Stable | Qwen3.5-35B-A3B, Phi-3.5-MoE, Mixtral, DeepSeek, 39+ architectures |
| TTS Fine-Tuning | ✅ Stable | Orpheus, OuteTTS, Spark-TTS, Sesame/CSM, Qwen3-TTS |
| STT Fine-Tuning | ✅ Stable | Whisper, Moonshine, Qwen3-ASR, Canary, Voxtral |
convert() |
✅ Stable | HF → MLX conversion (LLM, TTS, STT) |
| Embedding Fine-Tuning | ✅ Stable | BERT, ModernBERT, Qwen3-Embedding, Harrier (InfoNCE/contrastive) |
| OCR Fine-Tuning | ✅ Stable | DeepSeek-OCR, GLM-OCR, olmOCR, Qwen-VL, Pixtral + CER/WER metrics |
push_to_hub() |
✅ Stable | Upload to HuggingFace Hub |
| PyPI Package | ✅ Available | uv pip install mlx-tune |
Installation
# Using uv (recommended - faster and more reliable)
uv pip install mlx-tune
# With audio support (TTS/STT fine-tuning)
uv pip install 'mlx-tune[audio]'
brew install ffmpeg # system dependency for audio codecs
# Or using pip
pip install mlx-tune
# From source (for development)
git clone https://github.com/ARahim3/mlx-tune.git
cd mlx-tune
uv pip install -e .
Quick Start
from mlx_tune import FastLanguageModel, SFTTrainer, SFTConfig
from datasets import load_dataset
# Load any HuggingFace model (1B model for quick start)
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="mlx-community/Llama-3.2-1B-Instruct-4bit",
max_seq_length=2048,
load_in_4bit=True,
)
# Add LoRA adapters
model = FastLanguageModel.get_peft_model(
model,
r=16,
target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
lora_alpha=16,
)
# Load a dataset (or create your own)
dataset = load_dataset("yahma/alpaca-cleaned", split="train[:100]")
# Train with SFTTrainer (same API as TRL!)
trainer = SFTTrainer(
model=model,
train_dataset=dataset,
tokenizer=tokenizer,
args=SFTConfig(
output_dir="outputs",
per_device_train_batch_size=2,
learning_rate=2e-4,
max_steps=50,
),
)
trainer.train()
# Save (same API as Unsloth!)
model.save_pretrained("lora_model") # Adapters only
model.save_pretrained_merged("merged", tokenizer) # Full model
model.save_pretrained_gguf("model", tokenizer) # GGUF (see note below)
[!NOTE] GGUF Export: Works with non-quantized base models. If using a 4-bit model (like above), see Known Limitations for workarounds.
Chat Templates & Response-Only Training
from mlx_tune import get_chat_template, train_on_responses_only
# Apply chat template (supports llama-3, gemma, qwen, phi, mistral, etc.)
tokenizer = get_chat_template(tokenizer, chat_template="llama-3")
# Or auto-detect from model name
tokenizer = get_chat_template(tokenizer, chat_template="auto")
# Train only on responses (not prompts) - more efficient!
trainer = train_on_responses_only(
trainer,
instruction_part="<|start_header_id|>user<|end_header_id|>\n\n",
response_part="<|start_header_id|>assistant<|end_header_id|>\n\n",
)
Vision Model Fine-Tuning (NEW!)
Fine-tune vision-language models like Qwen3.5 on image+text tasks:
from mlx_tune import FastVisionModel, UnslothVisionDataCollator, VLMSFTTrainer
from mlx_tune.vlm import VLMSFTConfig
# Load a vision model
model, processor = FastVisionModel.from_pretrained(
"mlx-community/Qwen3.5-0.8B-bf16",
)
# Add LoRA (same params as Unsloth!)
model = FastVisionModel.get_peft_model(
model,
finetune_vision_layers=True,
finetune_language_layers=True,
r=16, lora_alpha=16,
)
# Train on image-text data
FastVisionModel.for_training(model)
trainer = VLMSFTTrainer(
model=model,
tokenizer=processor,
data_collator=UnslothVisionDataCollator(model, processor),
train_dataset=dataset,
args=VLMSFTConfig(max_steps=30, learning_rate=2e-4),
)
trainer.train()
See examples/10_qwen35_vision_finetuning.py for the full workflow, examples/11_qwen35_text_finetuning.py for text-only fine-tuning, or examples/26_vision_grpo_training.py for Vision GRPO reasoning.
TTS Fine-Tuning
Fine-tune text-to-speech models on Apple Silicon. Supports Orpheus-3B, OuteTTS-1B, Spark-TTS (0.5B), Sesame/CSM-1B, and Qwen3-TTS:
from mlx_tune import FastTTSModel, TTSSFTTrainer, TTSSFTConfig, TTSDataCollator
from datasets import load_dataset, Audio
# Auto-detects model type, codec, and token format
model, tokenizer = FastTTSModel.from_pretrained("mlx-community/orpheus-3b-0.1-ft-bf16")
# Also works with:
# "mlx-community/Llama-OuteTTS-1.0-1B-8bit" (DAC codec, 24kHz)
# "mlx-community/Spark-TTS-0.5B-bf16" (BiCodec, 16kHz)
model = FastTTSModel.get_peft_model(model, r=16, lora_alpha=16)
dataset = load_dataset("MrDragonFox/Elise", split="train[:100]")
dataset = dataset.cast_column("audio", Audio(sampling_rate=24000))
trainer = TTSSFTTrainer(
model=model, tokenizer=tokenizer,
data_collator=TTSDataCollator(model, tokenizer),
train_dataset=dataset,
args=TTSSFTConfig(output_dir="./tts_output", max_steps=60),
)
trainer.train()
See examples: Orpheus, OuteTTS, Spark-TTS, Qwen3-TTS.
STT Fine-Tuning
Fine-tune speech-to-text models. Supports Whisper (all sizes), Distil-Whisper, and Moonshine:
from mlx_tune import FastSTTModel, STTSFTTrainer, STTSFTConfig, STTDataCollator
# Auto-detects model type and preprocessor
model, processor = FastSTTModel.from_pretrained("mlx-community/whisper-tiny-asr-fp16")
# Also works with:
# "mlx-community/distil-whisper-large-v3" (Whisper architecture)
# "UsefulSensors/moonshine-tiny" (raw conv frontend)
model = FastSTTModel.get_peft_model(model, r=8, finetune_encoder=True, finetune_decoder=True)
trainer = STTSFTTrainer(
model=model, processor=processor,
data_collator=STTDataCollator(model, processor, language="en", task="transcribe"),
train_dataset=dataset,
args=STTSFTConfig(output_dir="./stt_output", max_steps=60),
)
trainer.train()
See examples: Whisper, Moonshine, Qwen3-ASR, Canary, Voxtral.
Embedding Fine-Tuning
Fine-tune sentence embedding models for semantic search using contrastive learning (InfoNCE loss). Supports BERT, ModernBERT, Qwen3-Embedding, Harrier, and more:
from mlx_tune import FastEmbeddingModel, EmbeddingSFTTrainer, EmbeddingSFTConfig, EmbeddingDataCollator
# Load embedding model (BERT or Qwen3-Embedding)
model, tokenizer = FastEmbeddingModel.from_pretrained(
"mlx-community/all-MiniLM-L6-v2-bf16", # or Qwen3-Embedding-0.6B-4bit-DWQ
pooling_strategy="mean", # "mean", "cls", or "last_token"
)
model = FastEmbeddingModel.get_peft_model(model, r=16, lora_alpha=16)
# Train with anchor-positive pairs (in-batch negatives via InfoNCE)
trainer = EmbeddingSFTTrainer(
model=model, tokenizer=tokenizer,
data_collator=EmbeddingDataCollator(model, tokenizer),
train_dataset=[{"anchor": "query text", "positive": "relevant passage"}, ...],
args=EmbeddingSFTConfig(
loss_type="infonce", temperature=0.05,
per_device_train_batch_size=32, max_steps=50,
),
)
trainer.train()
# Encode & compare
embeddings = model.encode(["Hello world", "Hi there"])
similarity = (embeddings[0] * embeddings[1]).sum().item()
See examples: BERT, Qwen3-Embedding, Harrier-0.6B, Harrier-270M.
OCR Fine-Tuning
Fine-tune dedicated OCR models or general VLMs for document understanding, handwriting recognition, LaTeX OCR, multilingual receipts, and more. Built-in CER/WER evaluation metrics:
from mlx_tune import FastOCRModel, OCRSFTTrainer, OCRSFTConfig, compute_ocr_metrics
# Load a dedicated OCR model (or any VLM like Qwen3.5)
model, processor = FastOCRModel.from_pretrained(
"mlx-community/DeepSeek-OCR-8bit", # 0.9B dedicated OCR model
)
model = FastOCRModel.get_peft_model(model, r=16, lora_alpha=16)
# Vision layers frozen by default (OCR models have pre-optimized encoders)
# Train on OCR data
trainer = OCRSFTTrainer(
model=model, processor=processor,
train_dataset=ocr_dataset,
args=OCRSFTConfig(max_steps=100, learning_rate=5e-5),
)
trainer.train()
# Transcribe & evaluate
text = model.transcribe(image)
metrics = model.evaluate(test_images, ground_truths) # → {cer, wer, exact_match}
Supported OCR models: DeepSeek-OCR, DeepSeek-OCR-2, GLM-OCR, DOTS-OCR, olmOCR-2, LightOnOCR, Qwen2.5-VL, Qwen3.5, Pixtral, and any VLM supported by mlx-vlm.
See examples: Document OCR, VLM→OCR, Handwriting, OCR GRPO, Multilingual.
MoE Fine-Tuning
Fine-tune Mixture of Experts models — 39+ architectures supported automatically. MLX-Tune detects MoE layers and applies per-expert LoRA via LoRASwitchLinear:
from mlx_tune import FastLanguageModel, SFTTrainer, SFTConfig
# Load any MoE model — same API as dense models!
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="mlx-community/Qwen3.5-35B-A3B-4bit", # 35B total, 3B active
max_seq_length=2048,
load_in_4bit=True,
)
# Same target_modules — MoE paths resolved automatically
model = FastLanguageModel.get_peft_model(
model, r=8,
target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj"],
)
# Prints: "MoE architecture detected — LoRA will target expert layers (SwitchLinear)"
Supported MoE models: Qwen3.5-35B-A3B, Qwen3-30B-A3B, Phi-3.5-MoE, Mixtral, DeepSeek-V2/V3, GLM-MoE, and all other MoE architectures in mlx-lm.
See examples: Qwen3.5 MoE, Phi-3.5 MoE.
Post-Training Workflow
All model types (LLM, VLM, TTS, STT) support the full post-training workflow:
# Save LoRA adapters
model.save_pretrained("./adapters")
# Merge LoRA into base model
model.save_pretrained_merged("./merged")
# Convert HF model to MLX format
FastLanguageModel.convert("model-name", mlx_path="./mlx_model")
# Push to HuggingFace Hub
model.push_to_hub("username/my-model")
Supported Training Methods
| Method | Trainer | Implementation | Use Case |
|---|---|---|---|
| SFT | SFTTrainer |
✅ Native MLX | Instruction fine-tuning |
| DPO | DPOTrainer |
✅ Native MLX | Preference learning (proper log-prob loss) |
| ORPO | ORPOTrainer |
✅ Native MLX | Combined SFT + odds ratio preference |
| GRPO | GRPOTrainer |
✅ Native MLX | Reasoning with multi-generation (DeepSeek R1 style) |
| KTO | KTOTrainer |
✅ Native MLX | Kahneman-Tversky optimization |
| SimPO | SimPOTrainer |
✅ Native MLX | Simple preference optimization |
| VLM SFT | VLMSFTTrainer |
✅ Native MLX | Vision-Language model fine-tuning |
| Vision GRPO | VLMGRPOTrainer |
✅ Native MLX | Vision-Language GRPO reasoning |
| TTS SFT | TTSSFTTrainer |
✅ Native MLX | Orpheus, OuteTTS, Spark-TTS, Sesame/CSM |
| STT SFT | STTSFTTrainer |
✅ Native MLX | Whisper, Moonshine, Qwen3-ASR, Canary, Voxtral |
| Embedding | EmbeddingSFTTrainer |
✅ Native MLX | BERT, ModernBERT, Qwen3-Embedding, Harrier (InfoNCE) |
| OCR SFT | OCRSFTTrainer |
✅ Native MLX | DeepSeek-OCR, GLM-OCR, Qwen-VL, Pixtral (CER/WER eval) |
| OCR GRPO | OCRGRPOTrainer |
✅ Native MLX | OCR with character-level RL rewards |
| MoE | SFTTrainer |
✅ Native MLX | Qwen3.5-MoE, Phi-3.5-MoE, Mixtral, DeepSeek (39+ archs) |
Examples
Check examples/ for working code:
- Basic model loading and inference (01–07)
- Complete SFT fine-tuning pipeline (08)
- RL training overview (09)
- Vision model fine-tuning — Qwen3.5 (10-11)
- RL E2E training — DPO (21), GRPO (22), ORPO (23), KTO (24), SimPO (25), Vision GRPO (26)
- TTS fine-tuning — Orpheus-3B (12), OuteTTS (14), Spark-TTS (15), Qwen3-TTS (20)
- STT fine-tuning — Whisper (13), Moonshine (16), Qwen3-ASR (17), Canary (18), Voxtral (19)
- Embedding fine-tuning — BERT/MiniLM (27), Qwen3-Embedding (28), Harrier-0.6B (31), Harrier-270M (32)
- OCR fine-tuning — Document OCR (33), VLM→OCR (34), Handwriting (35), OCR GRPO (36), Multilingual (37)
- MoE fine-tuning — Qwen3.5-35B-A3B (29), Phi-3.5-MoE (30)
Requirements
- Hardware: Apple Silicon Mac (M1/M2/M3/M4/M5)
- OS: macOS 13.0+
- Memory: 8GB+ unified RAM (16GB+ recommended)
- Python: 3.9+
Comparison with Unsloth
| Feature | Unsloth (CUDA) | MLX-Tune |
|---|---|---|
| Platform | NVIDIA GPUs | Apple Silicon |
| Backend | Triton Kernels | MLX Framework |
| Memory | VRAM (limited) | Unified (up to 512GB) |
| API | Original | 100% Compatible |
| Best For | Production training | Local dev, large models |
Known Limitations
GGUF Export from Quantized Models
The Issue: GGUF export (save_pretrained_gguf) doesn't work directly with quantized (4-bit) base models. This is a known limitation in mlx-lm, not an mlx-tune bug.
What Works:
- ✅ Training with quantized models (QLoRA) - works perfectly
- ✅ Saving adapters (
save_pretrained) - works - ✅ Saving merged model (
save_pretrained_merged) - works - ✅ Inference with trained model - works
- ❌ GGUF export from quantized base model - mlx-lm limitation
Workarounds:
-
Use a non-quantized base model (recommended for GGUF export):
# Use fp16 model instead of 4-bit model, tokenizer = FastLanguageModel.from_pretrained( model_name="mlx-community/Llama-3.2-1B-Instruct", # NOT -4bit max_seq_length=2048, load_in_4bit=False, # Train in fp16 ) # Train normally, then export model.save_pretrained_gguf("model", tokenizer) # Works!
-
Dequantize during export (results in large fp16 file):
model.save_pretrained_gguf("model", tokenizer, dequantize=True) # Then re-quantize with llama.cpp: # ./llama-quantize model.gguf model-q4_k_m.gguf Q4_K_M
-
Skip GGUF, use MLX format: If you only need the model for MLX/Python inference, just use
save_pretrained_merged()- no GGUF needed.
Related Issues:
- mlx-lm #353 - MLX to GGUF conversion
- mlx-examples #1382 - Quantized to GGUF
Contributing
Contributions welcome! Areas that need help:
- Custom MLX kernels for even faster training
- More test coverage (especially E2E and edge cases)
- Testing on different M-series chips (M1, M2, M3, M4, M5)
- Batched audio training (currently batch_size=1)
- Batched RL training (currently single-sample)
License
Apache 2.0 - See LICENSE file.
Acknowledgments
- Unsloth - The original, incredible CUDA library
- MLX - Apple's ML framework
- MLX-LM - LLM utilities for MLX
- MLX-VLM - Vision model support
- MLX-Audio - Audio inference (TTS/STT) for MLX
- MLX-Embeddings - Embedding models for MLX
Community project, not affiliated with Unsloth AI or Apple.
⭐ Star this repo if you find it useful!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mlx_tune-0.4.17.tar.gz.
File metadata
- Download URL: mlx_tune-0.4.17.tar.gz
- Upload date:
- Size: 194.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.12 {"installer":{"name":"uv","version":"0.9.12"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e01cd1d89910f9aed254814b81a9a9104059e632e0a320ea236e5f6687bb6ee7
|
|
| MD5 |
97baf3bb9cd90df994556ba262aea5b8
|
|
| BLAKE2b-256 |
eda9c5f1646f4e14ca915f0ef0a805386bfdb131fbd1ad232f0ff61ed72c4dad
|
File details
Details for the file mlx_tune-0.4.17-py3-none-any.whl.
File metadata
- Download URL: mlx_tune-0.4.17-py3-none-any.whl
- Upload date:
- Size: 136.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.12 {"installer":{"name":"uv","version":"0.9.12"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cda4b37b4fff5f66c569ccef04ad148bc6773e30fd72a7d6c5714426b9a79794
|
|
| MD5 |
ce82154fa5df914fdd1417f10b4ee49f
|
|
| BLAKE2b-256 |
fbfae0646875420e6b517011752f4fcfdfb15886569fa01d97e59e50a9685600
|