Skip to main content

Apple Silicon MLX fine-tuning toolkit — SFT, DPO/ORPO, GRPO, distillation, and OpenAI-compatible serving.

Project description

mlxsmith

Apple Silicon MLX fine-tuning toolkit — SFT, DPO/ORPO, GRPO, distillation, and OpenAI-compatible serving.

Status: alpha (v0.1.0). Full training pipeline validated on Qwen3-4B.

Install

MLX training and serving require macOS on Apple Silicon. Other platforms can use data tools and mock backends.

python -m venv .venv && source .venv/bin/activate
pip install -U pip

# Core CLI (data tools, config, project scaffolding)
pip install mlxsmith

# Apple Silicon training + serving
pip install "mlxsmith[mlx,llm,serve]"

# Everything
pip install "mlxsmith[all]"

Quickstart

mlxsmith init myproj
cd myproj
mlxsmith doctor        # check Python, MLX, Metal, ZMLX

Training

SFT (LoRA/QLoRA)

mlxsmith sft --model cache/mlx/Qwen__Qwen3-4B-Instruct-2507 --data data/sft

Produces run artifacts under runs/sft_NNNN/ (adapter weights, metrics.jsonl, config snapshot).

Preference tuning (DPO/ORPO)

mlxsmith pref --model cache/mlx/Qwen__Qwen3-4B-Instruct-2507 \
  --data data/prefs --algo dpo

Supports DPO and ORPO algorithms with configurable beta and KL coefficients. Expects {prompt, chosen, rejected} data format.

Reinforced fine-tuning (GRPO)

mlxsmith rft --model cache/mlx/Qwen__Qwen3-4B-Instruct-2507 \
  --env envs/coding.yaml --verifier verifiers/pytest.py

GRPO-style RL training with token-level environment integration and verifier-based rewards. Rollout acceptance/rejection gating with reward tracking.

Knowledge distillation

# Offline distillation (teacher generates, student learns)
mlxsmith distill --teacher large-model --student small-model --mode offline

# Online preference distillation (OPD)
mlxsmith distill --teacher large-model --student small-model --mode opd

Full pipeline

# Run SFT → Pref → RFT in sequence
mlxsmith pipeline

Serving

OpenAI-compatible /v1/chat/completions endpoint.

mlxsmith serve --model runs/sft_0001/adapter --port 8080
curl http://localhost:8080/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{"messages":[{"role":"user","content":"Hello"}],"max_tokens":64}'

Supports streaming ("stream": true), logprobs, stop sequences, and an optional UI dashboard (serve.ui: true in config).

Data tools

mlxsmith data presets                                     # list built-in datasets
mlxsmith data pull alpaca                                 # pull a preset
mlxsmith data import raw.json --out data/sft/train.jsonl  # import ShareGPT → JSONL
mlxsmith data split data/sft/train.jsonl --fractions 0.9 0.05 0.05
mlxsmith data stats data/sft/train.jsonl                  # token counts, field analysis
mlxsmith data validate data/sft/train.jsonl               # structure check

Built-in presets: alpaca, hh-rlhf, ultrachat-200k, ultrafeedback-binarized-prefs, ultrafeedback-binarized-sft.

Model management

# Pull + convert HF model to MLX
mlxsmith pull Qwen/Qwen3-4B-Instruct-2507

# With quantization
mlxsmith pull Qwen/Qwen3-4B-Instruct-2507 --quantize --q-bits 4

# Merge adapters
mlxsmith adapters merge runs/sft_0001/adapter runs/pref_0001/adapter --weights 0.7 0.3

HF auth

mlxsmith auth login --token "$HF_TOKEN"
mlxsmith auth status
mlxsmith auth logout

Eval and bench

# Evaluation suite (pass@k with verifier checks)
mlxsmith eval --suite eval/suites/coding.yaml

# Benchmark inference or training throughput
mlxsmith bench --mode inference
mlxsmith bench --mode trainer
mlxsmith bench --mode end_to_end

Verifiers

Built-in verifiers for eval, RFT, and preference tuning:

  • regex — pattern matching on completions
  • jsonschema — JSON structure validation
  • pytest — sandboxed test execution
  • docker — containerized verification
  • compose — multi-verifier composition (AND/OR/weighted)

See docs/VERIFIERS.md for the verifier API.

Environment plugin system

mlxsmith env list                  # list available environments
mlxsmith env info envs/coding.yaml # show manifest (tasks, verifier, version)
mlxsmith env init my_env           # scaffold a new environment
mlxsmith env install ./my_env      # install from directory
mlxsmith env package ./my_env      # create distributable tarball
mlxsmith env run envs/coding.yaml  # execute RFT with this environment

Environments define tasks, verifiers, and reward functions for RFT training. See docs/ENVIRONMENTS.md.

Config system

mlxsmith config show              # display merged config (YAML/JSON/TOML)
mlxsmith config show --sources    # show where each value comes from
mlxsmith config init              # create default mlxsmith.yaml
mlxsmith config validate          # check config structure
mlxsmith config env               # show environment variable mapping

Config sources (in priority order): CLI flags > environment variables (MLXSMITH__SECTION__KEY) > config file > defaults.

SDK (programmatic API)

For building custom training loops:

from mlxsmith.sdk import load_model, SamplingClient, TrainingClient, TrainingBatch

loaded = load_model("path/to/model", config)

# Sampling with logprobs
sampler = SamplingClient(loaded.backend)
result = sampler.sample("prompt", logprobs_k=5)

# Training operations
trainer = TrainingClient(loaded.backend)
trainer.create_optimizer(lr=1e-4, weight_decay=0.01)
fb = trainer.forward_backward(batch)
trainer.optim_step(fb.result().grads)

Loss functions: DPO, ORPO, GRPO, CISPO, DRO, PPO, importance sampling, cross-entropy.

Research

RLM self-play loop

RLM (Recursive Language Model) is a research feature — the infrastructure runs but has not produced measured gains yet.

mlxsmith rlm                       # single-process RLM
mlxsmith pipeline --orchestrated   # multi-process orchestrated RLM
mlxsmith rlm status                # check iteration state
mlxsmith rlm history               # view history

Includes task generation, mutation for data diversity, corpus management, EMA-based gating, and weight pointer IPC for multi-process coordination. See docs/orchestrator.md.

ZMLX acceleration

Optional zero-copy MLX acceleration backend.

mlxsmith accel status

Docs

  • docs/PROJECT_FORMAT.md — project layout and artifacts
  • docs/VERIFIERS.md — verifier API and sandbox behavior
  • docs/COMPATIBILITY.md — tested versions and model families
  • docs/ENVIRONMENTS.md — environment plugin system
  • docs/orchestrator.md — multi-process RLM orchestrator
  • docs/rlm-ctl.md — RLM training guide
  • docs/ROADMAP.md — product direction and milestones
  • docs/README.md — full docs index

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlxsmith-0.1.1.tar.gz (134.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mlxsmith-0.1.1-py3-none-any.whl (146.3 kB view details)

Uploaded Python 3

File details

Details for the file mlxsmith-0.1.1.tar.gz.

File metadata

  • Download URL: mlxsmith-0.1.1.tar.gz
  • Upload date:
  • Size: 134.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mlxsmith-0.1.1.tar.gz
Algorithm Hash digest
SHA256 5b62920f8f397abf8156085b6936af54153091b59ae3f5ee6dbb3bdb90f7765d
MD5 da434130319bbca04f003d4135b3ef88
BLAKE2b-256 7025a09ab9cea40153a1c9ab5a22416c906aacc8f7a8df52aa95292654a24c06

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlxsmith-0.1.1.tar.gz:

Publisher: publish.yml on Hmbown/MLXSmith

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mlxsmith-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: mlxsmith-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 146.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mlxsmith-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 8d9f6cff82e59b55eb3c5ffd0e03653f59d5c1edcf239af0966cb7a157da0620
MD5 1994d4354fdac95de4e5edd3b5c32681
BLAKE2b-256 3463aa0d64a214479e22883f2142cfe96326a3937bb2debc8379873226cfd494

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlxsmith-0.1.1-py3-none-any.whl:

Publisher: publish.yml on Hmbown/MLXSmith

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page