Apple Silicon MLX fine-tuning toolkit — SFT, DPO/ORPO, GRPO, distillation, and OpenAI-compatible serving.
Project description
MLXSmith
Fine-tune language models on Apple Silicon. SFT, preference optimization, reinforcement learning, distillation, and serving — all native to MLX.
Status: Alpha (v0.1.7) · Validated on Qwen3-4B
Features
- Supervised fine-tuning — LoRA and QLoRA with configurable optimizers
- Preference optimization — DPO, ORPO, IPO, CPO, SimPO, and more
- Reinforcement learning — GRPO with verifier-based rewards
- Knowledge distillation — Offline and online preference distillation
- KTO — Kahneman-Tversky Optimization from binary feedback
- Online DPO — Live preference tuning with LLM judge scoring
- Self-verification training — Policy gradient from self-assessed rewards
- Synthetic data generation — Generate, evolve, and filter training data
- External model backends — Use Codex, Claude, Gemini CLIs or any OpenAI-compatible API for data generation and judging
- Recursive training — Self-improving RLM loop with task generation and gating
- Serving — OpenAI-compatible API with streaming
- Environment plugins — Reusable task and verifier packages for RL training
Requirements
- macOS with Apple Silicon (M1 or later)
- Python 3.10+
Data tools, configuration, and project scaffolding work on any platform.
Install
pip install "mlxsmith[all]"
Selective install
# Core only (data tools, config, scaffolding)
pip install mlxsmith
# Apple Silicon training
pip install "mlxsmith[mlx,llm]"
# Training + serving
pip install "mlxsmith[mlx,llm,serve]"
Quickstart
# 1. Create a project
mlxsmith init myproj && cd myproj
# 2. Verify your environment
mlxsmith doctor
# 3. Pull a model
mlxsmith pull mlx-community/Qwen3-4B-Instruct-2507-4bit
# 4. Pull training data
mlxsmith data pull --preset alpaca
# 5. Fine-tune
mlxsmith sft \
--model cache/mlx/mlx-community__Qwen3-4B-Instruct-2507-4bit \
--data data/sft
# 6. Serve the result
mlxsmith serve --model runs/sft_0001/adapter --port 8080
See Getting Started for a complete walkthrough.
Training Modes
| Mode | Command | Input Format | Use Case |
|---|---|---|---|
| SFT | mlxsmith sft |
{prompt, response} |
Instruction-following via LoRA |
| Preference | mlxsmith pref |
{prompt, chosen, rejected} |
Alignment with DPO, ORPO, and others |
| KTO | mlxsmith kto |
{prompt, response, label} |
Binary good/bad feedback |
| GRPO | mlxsmith rft |
Environment + verifier | Reward-driven reinforcement learning |
| Online DPO | mlxsmith online-dpo |
{prompt} |
Online preference with LLM judge |
| Self-verify | mlxsmith self-verify |
{prompt} |
Self-verification reward signal |
| Distillation | mlxsmith distill |
{prompt} |
Teacher-to-student transfer |
| Judge | mlxsmith judge |
Judge-format data | Train a scoring model |
| Pipeline | mlxsmith pipeline |
Combined | SFT then Pref then RFT then RLM |
See Concepts for an explanation of each training mode.
Tools
| Tool | Command | Description |
|---|---|---|
| Data | mlxsmith data |
Import, split, validate, and pull datasets |
| Synthetic | mlxsmith synthetic |
Generate and evolve training data |
| Eval | mlxsmith eval |
Run evaluation suites with pass@k |
| Bench | mlxsmith bench |
Benchmark inference and training throughput |
| Serve | mlxsmith serve |
OpenAI-compatible model server |
| RLM | mlxsmith rlm |
Recursive training loop + REPL-based inference |
External Model Backends
MLXSmith can use powerful cloud models for synthetic data generation and judging while keeping fine-tuning local on Apple Silicon.
Supported backends:
cli— shell out to Codex/Claude/Gemini CLIs (or any command you provide)openai— call any OpenAI-compatible Chat Completions endpoint
Note: training commands (sft, pref, rft, rlm loop) still require a local training backend like mlx-lm.
CLI Backend — Shell out to Codex, Claude, or Gemini CLIs:
# Use a CLI model for prompt generation
export MLXSMITH__MODEL__BACKEND=cli
export MLXSMITH_CLI_CODEX_CMD='codex exec --full-auto --model gpt-5.2'
# If your CLI expects the prompt as an argument instead of stdin:
# export MLXSMITH_CLI_PROMPT_FLAG='--prompt'
mlxsmith synthetic prompts \
--model codex \
--seed-prompts data/seeds.jsonl \
--num 100 \
--out data/prompts.jsonl
# Use a CLI model as judge for filtering
mlxsmith synthetic sft \
--model codex \
--judge-backend cli \
--judge-model claude \
--prompts data/prompts.jsonl \
--out data/sft.jsonl
OpenAI Backend — Use any OpenAI-compatible API:
export MLXSMITH__MODEL__BACKEND=openai
export OPENAI_API_KEY="sk-..."
export MLXSMITH_API_BASE="https://api.openai.com/v1" # or any compatible endpoint
mlxsmith synthetic prompts \
--model gpt-4o \
--out data/prompts.jsonl
This enables cloud-quality data generation with local training — use frontier models to create and filter training data, then fine-tune efficiently on your Mac.
Documentation
| Section | Description |
|---|---|
| Getting Started | Full setup walkthrough |
| Concepts | Training modes explained |
| CLI Reference | All commands with examples |
| Verifiers | Verifier API and composition |
| Environments | Task environment plugins |
| Project Format | Run artifacts and layout |
| Configuration | Config system and options |
| Compatibility | Tested versions and models |
| Troubleshooting | Common issues and fixes |
| FAQ | Frequently asked questions |
| Contributing | How to contribute and run tests |
| Changelog | Release notes |
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mlxsmith-0.1.7.tar.gz.
File metadata
- Download URL: mlxsmith-0.1.7.tar.gz
- Upload date:
- Size: 167.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a89b57f0466c6055f75b17688165f74d7e268f698cec5da82f1c0d47081b7283
|
|
| MD5 |
34c30f694a1b7954f052756b1de6bc1d
|
|
| BLAKE2b-256 |
e6ebc370f0eca14f9bde30e7457afa83a8c292bf7613d15befa86c4c18aae185
|
Provenance
The following attestation bundles were made for mlxsmith-0.1.7.tar.gz:
Publisher:
publish.yml on Hmbown/MLXSmith
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mlxsmith-0.1.7.tar.gz -
Subject digest:
a89b57f0466c6055f75b17688165f74d7e268f698cec5da82f1c0d47081b7283 - Sigstore transparency entry: 909264384
- Sigstore integration time:
-
Permalink:
Hmbown/MLXSmith@dceda3719ea7b5ea4aeb8ed63cbad08b1fe4ef1c -
Branch / Tag:
refs/tags/v0.1.7 - Owner: https://github.com/Hmbown
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@dceda3719ea7b5ea4aeb8ed63cbad08b1fe4ef1c -
Trigger Event:
release
-
Statement type:
File details
Details for the file mlxsmith-0.1.7-py3-none-any.whl.
File metadata
- Download URL: mlxsmith-0.1.7-py3-none-any.whl
- Upload date:
- Size: 185.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f67aa28a803e998d90f78e861f510b5fe51dadd9f8e7ef351c40336262506e11
|
|
| MD5 |
0d4416bf09d62136ed0c47eea4392ddb
|
|
| BLAKE2b-256 |
4e22e211acdd5a994370303c855cf1c4119339eedd04f9d66d135b646016aa4c
|
Provenance
The following attestation bundles were made for mlxsmith-0.1.7-py3-none-any.whl:
Publisher:
publish.yml on Hmbown/MLXSmith
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mlxsmith-0.1.7-py3-none-any.whl -
Subject digest:
f67aa28a803e998d90f78e861f510b5fe51dadd9f8e7ef351c40336262506e11 - Sigstore transparency entry: 909264388
- Sigstore integration time:
-
Permalink:
Hmbown/MLXSmith@dceda3719ea7b5ea4aeb8ed63cbad08b1fe4ef1c -
Branch / Tag:
refs/tags/v0.1.7 - Owner: https://github.com/Hmbown
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@dceda3719ea7b5ea4aeb8ed63cbad08b1fe4ef1c -
Trigger Event:
release
-
Statement type: