Skip to main content

MLX-powered LLM/VLM/TTS/STT/Embedding fine-tuning for Apple Silicon - Unsloth-compatible API for Mac

Project description

MLX-Tune Logo

Fine-tune LLMs, Vision, and Audio models on your Mac
SFT, DPO, GRPO, Vision, TTS, STT, and Embedding fine-tuning — natively on MLX. Unsloth-compatible API.

GitHub stars PyPI Downloads GitHub forks
Platform Python MLX License

Documentation · Quick Start · Training Methods · Examples · Status


[!NOTE] Name Change: This project was originally called unsloth-mlx. Since it's not an official Unsloth project and to avoid any confusion, it has been renamed to mlx-tune. The vision remains the same — bringing the Unsloth experience to Mac users via MLX. If you were using unsloth-mlx, simply switch to pip install mlx-tune and update your imports from unsloth_mlx to mlx_tune.

[!NOTE] Why I Built This (A Personal Note)

I rely on Unsloth for my daily fine-tuning on cloud GPUs—it's the gold standard for me. But recently, I started working on a MacBook M4 and hit a friction point: I wanted to prototype locally on my Mac, then scale up to the cloud without rewriting my entire training script.

Since Unsloth relies on Triton (which Macs don't have, yet), I couldn't use it locally. I built mlx-tune to solve this specific "Context Switch" problem. It wraps Apple's native MLX framework in an Unsloth-compatible API.

The goal isn't to replace Unsloth or claim superior performance. The goal is code portability: allowing you to write FastLanguageModel code once on your Mac, test it, and then push that exact same script to a CUDA cluster. It solves a workflow problem, not just a hardware one.

This is an "unofficial" project built by a fan, for fans who happen to use Macs. It's helping me personally, and if it helps others like me, then I'll have my satisfaction.

Why MLX-Tune?

Bringing the Unsloth experience to Mac users via Apple's MLX framework.

  • 🚀 Fine-tune LLMs, VLMs, TTS, STT & Embeddings locally on your Mac (M1/M2/M3/M4/M5)
  • 💾 Leverage unified memory (up to 512GB on Mac Studio)
  • 🔄 Unsloth-compatible API - your existing training scripts just work!
  • 📦 Export anywhere - HuggingFace format, GGUF for Ollama/llama.cpp
  • 🎙️ Audio fine-tuning - 5 TTS models (Orpheus, OuteTTS, Spark, Sesame, Qwen3-TTS) + 5 STT models (Whisper, Moonshine, Qwen3-ASR, NVIDIA Canary, Voxtral)
# Unsloth (CUDA)                        # MLX-Tune (Apple Silicon)
from unsloth import FastLanguageModel   from mlx_tune import FastLanguageModel
from trl import SFTTrainer              from mlx_tune import SFTTrainer

# Rest of your code stays exactly the same!

What This Is (and Isn't)

This is NOT a replacement for Unsloth or an attempt to compete with it. Unsloth is incredible - it's the gold standard for efficient LLM fine-tuning on CUDA.

This IS a bridge for Mac users who want to:

  • 🧪 Prototype locally - Experiment with fine-tuning before committing to cloud GPU costs
  • 📚 Learn & iterate - Develop your training pipeline with fast local feedback loops
  • 🔄 Then scale up - Move to cloud NVIDIA GPUs + original Unsloth for production training
Local Mac (MLX-Tune)       →     Cloud GPU (Unsloth)
   Prototype & experiment          Full-scale training
   Small datasets                  Large datasets
   Quick iterations                Production runs

Project Status

🚀 v0.4.13 - Embedding fine-tuning (BERT, Qwen3-Embedding); Vision GRPO; E2E RL training

Feature Status Notes
SFT Training ✅ Stable Native MLX training
Model Loading ✅ Stable Any HuggingFace model (quantized & non-quantized)
Save/Export ✅ Stable HF format, GGUF (see limitations)
DPO Training ✅ Stable Full DPO loss
ORPO Training ✅ Stable Full ORPO loss
GRPO Training ✅ Stable Multi-generation + reward
KTO Training ✅ Stable Binary feedback + KTOConfig
SimPO Training ✅ Stable No ref model + SimPOConfig
Chat Templates ✅ Stable 15 models (llama, gemma, qwen, phi, mistral)
Response-Only Training ✅ Stable train_on_responses_only()
Multi-turn Merging ✅ Stable to_sharegpt() + conversation_extension
Column Mapping ✅ Stable apply_column_mapping() auto-rename
Dataset Config ✅ Stable HFDatasetConfig structured loading
Vision Models ✅ Stable Full VLM fine-tuning via mlx-vlm
TTS Fine-Tuning ✅ Stable Orpheus, OuteTTS, Spark-TTS, Sesame/CSM, Qwen3-TTS
STT Fine-Tuning ✅ Stable Whisper, Moonshine, Qwen3-ASR, Canary, Voxtral
convert() ✅ Stable HF → MLX conversion (LLM, TTS, STT)
Embedding Fine-Tuning ✅ Stable BERT, ModernBERT, Qwen3-Embedding (InfoNCE/contrastive)
push_to_hub() ✅ Stable Upload to HuggingFace Hub
PyPI Package ✅ Available uv pip install mlx-tune

Installation

# Using uv (recommended - faster and more reliable)
uv pip install mlx-tune

# With audio support (TTS/STT fine-tuning)
uv pip install 'mlx-tune[audio]'
brew install ffmpeg  # system dependency for audio codecs

# Or using pip
pip install mlx-tune

# From source (for development)
git clone https://github.com/ARahim3/mlx-tune.git
cd mlx-tune
uv pip install -e .

Quick Start

from mlx_tune import FastLanguageModel, SFTTrainer, SFTConfig
from datasets import load_dataset

# Load any HuggingFace model (1B model for quick start)
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="mlx-community/Llama-3.2-1B-Instruct-4bit",
    max_seq_length=2048,
    load_in_4bit=True,
)

# Add LoRA adapters
model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
    lora_alpha=16,
)

# Load a dataset (or create your own)
dataset = load_dataset("yahma/alpaca-cleaned", split="train[:100]")

# Train with SFTTrainer (same API as TRL!)
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    tokenizer=tokenizer,
    args=SFTConfig(
        output_dir="outputs",
        per_device_train_batch_size=2,
        learning_rate=2e-4,
        max_steps=50,
    ),
)
trainer.train()

# Save (same API as Unsloth!)
model.save_pretrained("lora_model")  # Adapters only
model.save_pretrained_merged("merged", tokenizer)  # Full model
model.save_pretrained_gguf("model", tokenizer)  # GGUF (see note below)

[!NOTE] GGUF Export: Works with non-quantized base models. If using a 4-bit model (like above), see Known Limitations for workarounds.

Chat Templates & Response-Only Training

from mlx_tune import get_chat_template, train_on_responses_only

# Apply chat template (supports llama-3, gemma, qwen, phi, mistral, etc.)
tokenizer = get_chat_template(tokenizer, chat_template="llama-3")

# Or auto-detect from model name
tokenizer = get_chat_template(tokenizer, chat_template="auto")

# Train only on responses (not prompts) - more efficient!
trainer = train_on_responses_only(
    trainer,
    instruction_part="<|start_header_id|>user<|end_header_id|>\n\n",
    response_part="<|start_header_id|>assistant<|end_header_id|>\n\n",
)

Vision Model Fine-Tuning (NEW!)

Fine-tune vision-language models like Qwen3.5 on image+text tasks:

from mlx_tune import FastVisionModel, UnslothVisionDataCollator, VLMSFTTrainer
from mlx_tune.vlm import VLMSFTConfig

# Load a vision model
model, processor = FastVisionModel.from_pretrained(
    "mlx-community/Qwen3.5-0.8B-bf16",
)

# Add LoRA (same params as Unsloth!)
model = FastVisionModel.get_peft_model(
    model,
    finetune_vision_layers=True,
    finetune_language_layers=True,
    r=16, lora_alpha=16,
)

# Train on image-text data
FastVisionModel.for_training(model)
trainer = VLMSFTTrainer(
    model=model,
    tokenizer=processor,
    data_collator=UnslothVisionDataCollator(model, processor),
    train_dataset=dataset,
    args=VLMSFTConfig(max_steps=30, learning_rate=2e-4),
)
trainer.train()

See examples/10_qwen35_vision_finetuning.py for the full workflow, examples/11_qwen35_text_finetuning.py for text-only fine-tuning, or examples/26_vision_grpo_training.py for Vision GRPO reasoning.

TTS Fine-Tuning

Fine-tune text-to-speech models on Apple Silicon. Supports Orpheus-3B, OuteTTS-1B, Spark-TTS (0.5B), Sesame/CSM-1B, and Qwen3-TTS:

from mlx_tune import FastTTSModel, TTSSFTTrainer, TTSSFTConfig, TTSDataCollator
from datasets import load_dataset, Audio

# Auto-detects model type, codec, and token format
model, tokenizer = FastTTSModel.from_pretrained("mlx-community/orpheus-3b-0.1-ft-bf16")
# Also works with:
#   "mlx-community/Llama-OuteTTS-1.0-1B-8bit"   (DAC codec, 24kHz)
#   "mlx-community/Spark-TTS-0.5B-bf16"          (BiCodec, 16kHz)
model = FastTTSModel.get_peft_model(model, r=16, lora_alpha=16)

dataset = load_dataset("MrDragonFox/Elise", split="train[:100]")
dataset = dataset.cast_column("audio", Audio(sampling_rate=24000))

trainer = TTSSFTTrainer(
    model=model, tokenizer=tokenizer,
    data_collator=TTSDataCollator(model, tokenizer),
    train_dataset=dataset,
    args=TTSSFTConfig(output_dir="./tts_output", max_steps=60),
)
trainer.train()

See examples: Orpheus, OuteTTS, Spark-TTS, Qwen3-TTS.

STT Fine-Tuning

Fine-tune speech-to-text models. Supports Whisper (all sizes), Distil-Whisper, and Moonshine:

from mlx_tune import FastSTTModel, STTSFTTrainer, STTSFTConfig, STTDataCollator

# Auto-detects model type and preprocessor
model, processor = FastSTTModel.from_pretrained("mlx-community/whisper-tiny-asr-fp16")
# Also works with:
#   "mlx-community/distil-whisper-large-v3"   (Whisper architecture)
#   "UsefulSensors/moonshine-tiny"             (raw conv frontend)
model = FastSTTModel.get_peft_model(model, r=8, finetune_encoder=True, finetune_decoder=True)

trainer = STTSFTTrainer(
    model=model, processor=processor,
    data_collator=STTDataCollator(model, processor, language="en", task="transcribe"),
    train_dataset=dataset,
    args=STTSFTConfig(output_dir="./stt_output", max_steps=60),
)
trainer.train()

See examples: Whisper, Moonshine, Qwen3-ASR, Canary, Voxtral.

Embedding Fine-Tuning

Fine-tune sentence embedding models for semantic search using contrastive learning (InfoNCE loss). Supports BERT, ModernBERT, Qwen3-Embedding, and more:

from mlx_tune import FastEmbeddingModel, EmbeddingSFTTrainer, EmbeddingSFTConfig, EmbeddingDataCollator

# Load embedding model (BERT or Qwen3-Embedding)
model, tokenizer = FastEmbeddingModel.from_pretrained(
    "mlx-community/all-MiniLM-L6-v2-bf16",  # or Qwen3-Embedding-0.6B-4bit-DWQ
    pooling_strategy="mean",                  # "mean", "cls", or "last_token"
)
model = FastEmbeddingModel.get_peft_model(model, r=16, lora_alpha=16)

# Train with anchor-positive pairs (in-batch negatives via InfoNCE)
trainer = EmbeddingSFTTrainer(
    model=model, tokenizer=tokenizer,
    data_collator=EmbeddingDataCollator(model, tokenizer),
    train_dataset=[{"anchor": "query text", "positive": "relevant passage"}, ...],
    args=EmbeddingSFTConfig(
        loss_type="infonce", temperature=0.05,
        per_device_train_batch_size=32, max_steps=50,
    ),
)
trainer.train()

# Encode & compare
embeddings = model.encode(["Hello world", "Hi there"])
similarity = (embeddings[0] * embeddings[1]).sum().item()

See examples: BERT, Qwen3-Embedding.

Post-Training Workflow

All model types (LLM, VLM, TTS, STT) support the full post-training workflow:

# Save LoRA adapters
model.save_pretrained("./adapters")

# Merge LoRA into base model
model.save_pretrained_merged("./merged")

# Convert HF model to MLX format
FastLanguageModel.convert("model-name", mlx_path="./mlx_model")

# Push to HuggingFace Hub
model.push_to_hub("username/my-model")

Supported Training Methods

Method Trainer Implementation Use Case
SFT SFTTrainer ✅ Native MLX Instruction fine-tuning
DPO DPOTrainer ✅ Native MLX Preference learning (proper log-prob loss)
ORPO ORPOTrainer ✅ Native MLX Combined SFT + odds ratio preference
GRPO GRPOTrainer ✅ Native MLX Reasoning with multi-generation (DeepSeek R1 style)
KTO KTOTrainer ✅ Native MLX Kahneman-Tversky optimization
SimPO SimPOTrainer ✅ Native MLX Simple preference optimization
VLM SFT VLMSFTTrainer ✅ Native MLX Vision-Language model fine-tuning
Vision GRPO VLMGRPOTrainer ✅ Native MLX Vision-Language GRPO reasoning
TTS SFT TTSSFTTrainer ✅ Native MLX Orpheus, OuteTTS, Spark-TTS, Sesame/CSM
STT SFT STTSFTTrainer ✅ Native MLX Whisper, Moonshine, Qwen3-ASR, Canary, Voxtral
Embedding EmbeddingSFTTrainer ✅ Native MLX BERT, ModernBERT, Qwen3-Embedding (InfoNCE)

Examples

Check examples/ for working code:

  • Basic model loading and inference (01–07)
  • Complete SFT fine-tuning pipeline (08)
  • RL training overview (09)
  • Vision model fine-tuning — Qwen3.5 (10-11)
  • RL E2E training — DPO (21), GRPO (22), ORPO (23), KTO (24), SimPO (25), Vision GRPO (26)
  • TTS fine-tuning — Orpheus-3B (12), OuteTTS (14), Spark-TTS (15), Qwen3-TTS (20)
  • STT fine-tuning — Whisper (13), Moonshine (16), Qwen3-ASR (17), Canary (18), Voxtral (19)
  • Embedding fine-tuning — BERT/MiniLM (27), Qwen3-Embedding (28)

Requirements

  • Hardware: Apple Silicon Mac (M1/M2/M3/M4/M5)
  • OS: macOS 13.0+
  • Memory: 8GB+ unified RAM (16GB+ recommended)
  • Python: 3.9+

Comparison with Unsloth

Feature Unsloth (CUDA) MLX-Tune
Platform NVIDIA GPUs Apple Silicon
Backend Triton Kernels MLX Framework
Memory VRAM (limited) Unified (up to 512GB)
API Original 100% Compatible
Best For Production training Local dev, large models

Known Limitations

GGUF Export from Quantized Models

The Issue: GGUF export (save_pretrained_gguf) doesn't work directly with quantized (4-bit) base models. This is a known limitation in mlx-lm, not an mlx-tune bug.

What Works:

  • ✅ Training with quantized models (QLoRA) - works perfectly
  • ✅ Saving adapters (save_pretrained) - works
  • ✅ Saving merged model (save_pretrained_merged) - works
  • ✅ Inference with trained model - works
  • ❌ GGUF export from quantized base model - mlx-lm limitation

Workarounds:

  1. Use a non-quantized base model (recommended for GGUF export):

    # Use fp16 model instead of 4-bit
    model, tokenizer = FastLanguageModel.from_pretrained(
        model_name="mlx-community/Llama-3.2-1B-Instruct",  # NOT -4bit
        max_seq_length=2048,
        load_in_4bit=False,  # Train in fp16
    )
    # Train normally, then export
    model.save_pretrained_gguf("model", tokenizer)  # Works!
    
  2. Dequantize during export (results in large fp16 file):

    model.save_pretrained_gguf("model", tokenizer, dequantize=True)
    # Then re-quantize with llama.cpp:
    # ./llama-quantize model.gguf model-q4_k_m.gguf Q4_K_M
    
  3. Skip GGUF, use MLX format: If you only need the model for MLX/Python inference, just use save_pretrained_merged() - no GGUF needed.

Related Issues:

Contributing

Contributions welcome! Areas that need help:

  • Custom MLX kernels for even faster training
  • More test coverage (especially E2E and edge cases)
  • Testing on different M-series chips (M1, M2, M3, M4, M5)
  • Batched audio training (currently batch_size=1)
  • Batched RL training (currently single-sample)

License

Apache 2.0 - See LICENSE file.

Acknowledgments


Community project, not affiliated with Unsloth AI or Apple.
⭐ Star this repo if you find it useful!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlx_tune-0.4.13.tar.gz (177.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mlx_tune-0.4.13-py3-none-any.whl (125.2 kB view details)

Uploaded Python 3

File details

Details for the file mlx_tune-0.4.13.tar.gz.

File metadata

  • Download URL: mlx_tune-0.4.13.tar.gz
  • Upload date:
  • Size: 177.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.12 {"installer":{"name":"uv","version":"0.9.12"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for mlx_tune-0.4.13.tar.gz
Algorithm Hash digest
SHA256 b57da0d38855d2b14dc90d61da5dee64be28265c7a4adb21045602ef9c3a5578
MD5 d0eab037df358d7f58fd4577a12d0c35
BLAKE2b-256 bbe31f9aee694a674cdeef1e660b5881c47283a6faa06d24eb174089b32d3d98

See more details on using hashes here.

File details

Details for the file mlx_tune-0.4.13-py3-none-any.whl.

File metadata

  • Download URL: mlx_tune-0.4.13-py3-none-any.whl
  • Upload date:
  • Size: 125.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.12 {"installer":{"name":"uv","version":"0.9.12"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for mlx_tune-0.4.13-py3-none-any.whl
Algorithm Hash digest
SHA256 d1af6c214d1cde10f9f10cdef292b2d5d3e2fd1264fa3986c915efdac79c5fd6
MD5 bb9fd8a64311dcf07d13be4f15f836b1
BLAKE2b-256 a01f06f0d1e4cf661dd334ae89912974065807d8c50709de38e1a12dc08d7d4f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page