Skip to main content

Apple Silicon MLX fine-tuning toolkit — SFT, DPO/ORPO, GRPO, distillation, and OpenAI-compatible serving.

Project description

MLXSmith

PyPI CI License

Fine-tune language models on Apple Silicon. SFT, preference optimization, reinforcement learning, distillation, and serving — all native to MLX.

Status: Alpha (v0.1.9) · Validated on Qwen3-4B and Qwen3-1.7B


Features

  • Supervised fine-tuning — LoRA and QLoRA with configurable optimizers
  • Preference optimization — DPO, ORPO, IPO, CPO, SimPO, and more
  • Reinforcement learning — GRPO with verifier-based rewards
  • Knowledge distillation — Offline and online preference distillation
  • KTO — Kahneman-Tversky Optimization from binary feedback
  • Online DPO — Live preference tuning with LLM judge scoring
  • Self-verification training — Policy gradient from self-assessed rewards
  • Synthetic data generation — Generate, evolve, and filter training data
  • External model backends — Use Codex, Claude, Gemini CLIs or any OpenAI-compatible API for data generation and judging
  • Recursive training — Self-improving RLM loop with task generation and gating
  • Serving — OpenAI-compatible API with streaming
  • Web dashboard (Next.js) — Models, adapters, training, eval, chat, and serving UI
  • Environment plugins — Reusable task and verifier packages for RL training
  • Experimental mHC adapters — Optional block-local mHC patching for MLX transformer blocks (not a speedup)

Requirements

  • macOS with Apple Silicon (M1 or later)
  • Python 3.10+

Data tools, configuration, and project scaffolding work on any platform.

Install

pip install "mlxsmith[all]"
Selective install
# Core only (data tools, config, scaffolding)
pip install mlxsmith

# Apple Silicon training
pip install "mlxsmith[mlx,llm]"

# Training + serving
pip install "mlxsmith[mlx,llm,serve]"

Quickstart

# 1. Create a project
mlxsmith init myproj && cd myproj

# 2. Verify your environment
mlxsmith doctor

# 3. Pull a model
mlxsmith pull mlx-community/Qwen3-4B-Instruct-2507-4bit

# 4. Pull training data
mlxsmith data pull --preset alpaca

# 5. Fine-tune
mlxsmith sft \
  --model cache/mlx/mlx-community__Qwen3-4B-Instruct-2507-4bit \
  --data data/sft

# 6. Serve the result
mlxsmith serve --model runs/sft_0001/adapter --port 8080

See Getting Started for a complete walkthrough.

End-to-end Smoke (Qwen3-1.7B)

This repo includes an end-to-end smoke run that validates the full pipeline (SFT → Pref → RFT → RLM) on Qwen/Qwen3-1.7B-MLX-4bit.

mlxsmith pull Qwen/Qwen3-1.7B-MLX-4bit
./scripts/exp_qwen3_1.7b_mlx_4bit_e2e_smoke.sh

# Optional: also smoke-test `mlxsmith serve` + OpenAI-compatible endpoint
SMOKE_SERVE=1 ./scripts/exp_qwen3_1.7b_mlx_4bit_e2e_smoke.sh

The smoke run uses qwen3_1.7b_mlx_4bit_smoke.yaml and the tiny datasets in data/sft and data/prefs.

Repo SFT (Qwen3-1.7B)

To build a small “MLXSmith repo assistant” adapter on top of Qwen/Qwen3-1.7B-MLX-4bit, use the repo-grounded SFT script:

# 1) Generate seed prompts from the repo
python3 scripts/make_repo_seed_prompts.py --out data/mlxsmith_prompts.jsonl

# 2) Generate responses (via Codex) + train LoRA
NUM=300 BATCH=4 ITERS=2000 LR=2e-4 ./scripts/exp_qwen3_1.7b_mlx_4bit_repo_sft.sh

Notes:

  • The script uses codex exec by default. Override with MLXSMITH_CLI_CODEX_CMD if needed.
  • Qwen output sanitization is enabled in the included configs via infer.strip_think: true.

Web Dashboard (Optional)

Run the API server, then start the Next.js dashboard:

# Terminal 1: start the OpenAI-compatible API
mlxsmith serve --model cache/mlx/mlx-community__Qwen3-4B-Instruct-2507-4bit --port 8080

# Terminal 2: start the dashboard
cd apps/web
npm install
npm run dev

The dashboard defaults to http://localhost:8080 for the API base URL (change in Settings if needed).

Training Modes

Mode Command Input Format Use Case
SFT mlxsmith sft {prompt, response} Instruction-following via LoRA
Preference mlxsmith pref {prompt, chosen, rejected} Alignment with DPO, ORPO, and others
KTO mlxsmith kto {prompt, response, label} Binary good/bad feedback
GRPO mlxsmith rft Environment + verifier Reward-driven reinforcement learning
Online DPO mlxsmith online-dpo {prompt} Online preference with LLM judge
Self-verify mlxsmith self-verify {prompt} Self-verification reward signal
Distillation mlxsmith distill {prompt} Teacher-to-student transfer
Judge mlxsmith judge Judge-format data Train a scoring model
Pipeline mlxsmith pipeline Combined SFT then Pref then RFT then RLM

See Concepts for an explanation of each training mode.

Tools

Tool Command Description
Data mlxsmith data Import, split, validate, and pull datasets
Synthetic mlxsmith synthetic Generate and evolve training data
Eval mlxsmith eval Run evaluation suites with pass@k
Bench mlxsmith bench Benchmark inference and training throughput
Serve mlxsmith serve OpenAI-compatible model server
RLM mlxsmith rlm Recursive training loop + REPL-based inference

External Model Backends

MLXSmith can use powerful cloud models for synthetic data generation and judging while keeping fine-tuning local on Apple Silicon.

Supported backends:

  • cli — shell out to Codex/Claude/Gemini CLIs (or any command you provide)
  • openai — call any OpenAI-compatible Chat Completions endpoint

Note: training commands (sft, pref, rft, rlm loop) still require a local training backend like mlx-lm.

CLI Backend — Shell out to Codex, Claude, or Gemini CLIs:

# Use a CLI model for prompt generation
export MLXSMITH__MODEL__BACKEND=cli
export MLXSMITH_CLI_CODEX_CMD='codex exec --full-auto --model gpt-5.2'

# If your CLI expects the prompt as an argument instead of stdin:
# export MLXSMITH_CLI_PROMPT_FLAG='--prompt'

mlxsmith synthetic prompts \
  --model codex \
  --seed-prompts data/seeds.jsonl \
  --num 100 \
  --out data/prompts.jsonl

# Use a CLI model as judge for filtering
mlxsmith synthetic sft \
  --model codex \
  --judge-backend cli \
  --judge-model claude \
  --prompts data/prompts.jsonl \
  --out data/sft.jsonl

OpenAI Backend — Use any OpenAI-compatible API:

export MLXSMITH__MODEL__BACKEND=openai
export OPENAI_API_KEY="sk-..."
export MLXSMITH_API_BASE="https://api.openai.com/v1"  # or any compatible endpoint

mlxsmith synthetic prompts \
  --model gpt-4o \
  --out data/prompts.jsonl

This enables cloud-quality data generation with local training — use frontier models to create and filter training data, then fine-tune efficiently on your Mac.

Documentation

Section Description
Getting Started Full setup walkthrough
Concepts Training modes explained
CLI Reference All commands with examples
Verifiers Verifier API and composition
Environments Task environment plugins
Project Format Run artifacts and layout
Configuration Config system and options
Compatibility Tested versions and models
Troubleshooting Common issues and fixes
FAQ Frequently asked questions
Contributing How to contribute and run tests
Changelog Release notes

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlxsmith-0.1.9.tar.gz (177.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mlxsmith-0.1.9-py3-none-any.whl (193.8 kB view details)

Uploaded Python 3

File details

Details for the file mlxsmith-0.1.9.tar.gz.

File metadata

  • Download URL: mlxsmith-0.1.9.tar.gz
  • Upload date:
  • Size: 177.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mlxsmith-0.1.9.tar.gz
Algorithm Hash digest
SHA256 6ecdb8c42c86aa893b2c3e7bffbac9d9cd549d0b75e3a66f32d64ea23a7f1849
MD5 3addb6c410f87134bf89239c4934fad8
BLAKE2b-256 5014f47eb398d353202ae46bf206a6e7602cffb879057f7d605d90922e9deb89

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlxsmith-0.1.9.tar.gz:

Publisher: publish.yml on Hmbown/MLXSmith

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mlxsmith-0.1.9-py3-none-any.whl.

File metadata

  • Download URL: mlxsmith-0.1.9-py3-none-any.whl
  • Upload date:
  • Size: 193.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mlxsmith-0.1.9-py3-none-any.whl
Algorithm Hash digest
SHA256 9efa8f224a42e5cb44c2daae69d7f01e0817f752d43a9fb0df376a34e6b65a38
MD5 6be61081c0ea76e19ed44d8b5f487e43
BLAKE2b-256 56f96e667d7c0b1a2fcb849391ba9bc662d317211e7f779bfdb5680148dd1a87

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlxsmith-0.1.9-py3-none-any.whl:

Publisher: publish.yml on Hmbown/MLXSmith

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page