Skip to main content

Apple Silicon MLX fine-tuning toolkit — SFT, DPO/ORPO, GRPO, distillation, and OpenAI-compatible serving.

Project description

mlxsmith

Apple Silicon MLX fine-tuning toolkit — SFT, DPO/ORPO, GRPO, distillation, and OpenAI-compatible serving.

Status: alpha (v0.1.2). Full training pipeline validated on Qwen3-4B.

Install

MLX training and serving require macOS on Apple Silicon. Other platforms can use data tools and mock backends.

python -m venv .venv && source .venv/bin/activate
pip install -U pip

# Core CLI (data tools, config, project scaffolding)
pip install mlxsmith

# Apple Silicon training + serving
pip install "mlxsmith[mlx,llm,serve]"

# mlx-lm-lora passthrough (advanced training methods)
pip install "mlxsmith[lora]"

# Everything
pip install "mlxsmith[all]"

Quickstart

mlxsmith init myproj
cd myproj
mlxsmith doctor        # check Python, MLX, Metal

Training

SFT (LoRA/QLoRA)

mlxsmith sft --model cache/mlx/Qwen__Qwen3-4B-Instruct-2507 --data data/sft

Produces run artifacts under runs/sft_NNNN/ (adapter weights, metrics.jsonl, config snapshot).

Preference tuning (DPO/ORPO)

mlxsmith pref --model cache/mlx/Qwen__Qwen3-4B-Instruct-2507 \
  --data data/prefs --algo dpo

Supports DPO and ORPO algorithms with configurable beta and KL coefficients. Expects {prompt, chosen, rejected} data format.

Reinforced fine-tuning (GRPO)

mlxsmith rft --model cache/mlx/Qwen__Qwen3-4B-Instruct-2507 \
  --env envs/coding.yaml --verifier verifiers/pytest.py

GRPO-style RL training with token-level environment integration and verifier-based rewards. Rollout acceptance/rejection gating with reward tracking.

Knowledge distillation

# Offline distillation (teacher generates, student learns)
mlxsmith distill --teacher large-model --student small-model --mode offline

# Online preference distillation (OPD)
mlxsmith distill --teacher large-model --student small-model --mode opd

Full pipeline

# Run SFT → Pref → RFT in sequence
mlxsmith pipeline

mlx-lm-lora parity (all methods)

Use the passthrough to access mlx-lm-lora features (DPO variants, GRPO variants, PPO, synthetic datasets, judge training, etc.):

# Train with mlx-lm-lora directly
mlxsmith lora train --model Qwen/Qwen3-4B-Instruct-2507 --data data/prefs --train-mode dpo -- --beta 0.1

# Generate synthetic datasets
mlxsmith lora synthetic prompts -- --model mlx-community/Qwen3-4B-Instruct-2507-4bit --num-samples 1000

# Train judge model
mlxsmith lora judge -- --model mlx-community/Qwen3-4B-Instruct-2507-4bit --data data/prefs

Serving

OpenAI-compatible /v1/chat/completions endpoint.

mlxsmith serve --model runs/sft_0001/adapter --port 8080
curl http://localhost:8080/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{"messages":[{"role":"user","content":"Hello"}],"max_tokens":64}'

Supports streaming ("stream": true), logprobs, stop sequences, and an optional UI dashboard (serve.ui: true in config).

Data tools

mlxsmith data presets                                     # list built-in datasets
mlxsmith data pull alpaca                                 # pull a preset
mlxsmith data import raw.json --out data/sft/train.jsonl  # import ShareGPT → JSONL
mlxsmith data split data/sft/train.jsonl --fractions 0.9 0.05 0.05
mlxsmith data stats data/sft/train.jsonl                  # token counts, field analysis
mlxsmith data validate data/sft/train.jsonl               # structure check

Built-in presets: alpaca, hh-rlhf, ultrachat-200k, ultrafeedback-binarized-prefs, ultrafeedback-binarized-sft.

Model management

# Pull + convert HF model to MLX
mlxsmith pull Qwen/Qwen3-4B-Instruct-2507

# With quantization
mlxsmith pull Qwen/Qwen3-4B-Instruct-2507 --quantize --q-bits 4

# Merge adapters
mlxsmith adapters merge runs/sft_0001/adapter runs/pref_0001/adapter --weights 0.7 0.3

HF auth

mlxsmith auth login --token "$HF_TOKEN"
mlxsmith auth status
mlxsmith auth logout

Eval and bench

# Evaluation suite (pass@k with verifier checks)
mlxsmith eval --suite eval/suites/coding.yaml

# Benchmark inference or training throughput
mlxsmith bench --mode inference
mlxsmith bench --mode trainer
mlxsmith bench --mode end_to_end

Verifiers

Built-in verifiers for eval, RFT, and preference tuning:

  • regex — pattern matching on completions
  • jsonschema — JSON structure validation
  • pytest — sandboxed test execution
  • docker — containerized verification
  • compose — multi-verifier composition (AND/OR/weighted)
  • llm_judge — LLM-based self-verification / ThinkPRM-style verifier

See docs/VERIFIERS.md for the verifier API.

Environment plugin system

mlxsmith env list                  # list available environments
mlxsmith env info envs/coding.yaml # show manifest (tasks, verifier, version)
mlxsmith env init my_env           # scaffold a new environment
mlxsmith env install ./my_env      # install from directory
mlxsmith env package ./my_env      # create distributable tarball
mlxsmith env run envs/coding.yaml  # execute RFT with this environment

Environments define tasks, verifiers, and reward functions for RFT training. See docs/ENVIRONMENTS.md.

Config system

mlxsmith config show              # display merged config (YAML/JSON/TOML)
mlxsmith config show --sources    # show where each value comes from
mlxsmith config init              # create default mlxsmith.yaml
mlxsmith config validate          # check config structure
mlxsmith config env               # show environment variable mapping

Config sources (in priority order): CLI flags > environment variables (MLXSMITH__SECTION__KEY) > config file > defaults.

Training optimizers are configurable via train.optimizer and train.optimizer_kwargs (for example adamw, adam, qhadam, muon when available in MLX).

SDK (programmatic API)

For building custom training loops:

from mlxsmith.sdk import load_model, SamplingClient, TrainingClient, TrainingBatch

loaded = load_model("path/to/model", config)

# Sampling with logprobs
sampler = SamplingClient(loaded.backend)
result = sampler.sample("prompt", logprobs_k=5)

# Training operations
trainer = TrainingClient(loaded.backend)
trainer.create_optimizer(lr=1e-4, weight_decay=0.01)
fb = trainer.forward_backward(batch)
trainer.optim_step(fb.result().grads)

Loss functions: DPO, ORPO, GRPO, CISPO, DRO, PPO, importance sampling, cross-entropy.

Research

RLM self-play loop

RLM (Recursive Language Model) is a research feature — the infrastructure runs but has not produced measured gains yet.

mlxsmith rlm                       # single-process RLM
mlxsmith pipeline --orchestrated   # multi-process orchestrated RLM
mlxsmith rlm status                # check iteration state
mlxsmith rlm history               # view history

Includes task generation, mutation for data diversity, corpus management, EMA-based gating, and weight pointer IPC for multi-process coordination. See docs/orchestrator.md.

Docs

  • docs/PROJECT_FORMAT.md — project layout and artifacts
  • docs/VERIFIERS.md — verifier API and sandbox behavior
  • docs/COMPATIBILITY.md — tested versions and model families
  • docs/ENVIRONMENTS.md — environment plugin system
  • docs/orchestrator.md — multi-process RLM orchestrator
  • docs/rlm-ctl.md — RLM training guide
  • docs/ROADMAP.md — product direction and milestones
  • docs/README.md — full docs index

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlxsmith-0.1.3.tar.gz (143.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mlxsmith-0.1.3-py3-none-any.whl (158.4 kB view details)

Uploaded Python 3

File details

Details for the file mlxsmith-0.1.3.tar.gz.

File metadata

  • Download URL: mlxsmith-0.1.3.tar.gz
  • Upload date:
  • Size: 143.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mlxsmith-0.1.3.tar.gz
Algorithm Hash digest
SHA256 b45994d2b2da36ee56e6a098c3645eb875bd6f44e1f46ad3115e7ad640d1f3ee
MD5 2fa4328008b41c1c22be2f9d1c30a0df
BLAKE2b-256 e7f81afe40921761a0c47360fa36efc799354f01049d32164abec3602402161a

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlxsmith-0.1.3.tar.gz:

Publisher: publish.yml on Hmbown/MLXSmith

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mlxsmith-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: mlxsmith-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 158.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mlxsmith-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 6353c11e048a63084da6eaff4268a140d11a3272572d7ba47af5a088e671fde0
MD5 8f56aca0482681da2d81a1f7fd44c78b
BLAKE2b-256 76a188d64ad63a30e51055b2e9d6cd7749880a15c0463e2bfee8a55300b9a375

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlxsmith-0.1.3-py3-none-any.whl:

Publisher: publish.yml on Hmbown/MLXSmith

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page