Apple Silicon MLX fine-tuning toolkit — SFT, DPO/ORPO, GRPO, distillation, and OpenAI-compatible serving.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

hmbown

These details have not been verified by PyPI

Project description

mlxsmith

Apple Silicon MLX fine-tuning toolkit — SFT, DPO/ORPO, GRPO, distillation, and OpenAI-compatible serving.

Status: alpha (v0.1.2). Full training pipeline validated on Qwen3-4B.

Install

MLX training and serving require macOS on Apple Silicon. Other platforms can use data tools and mock backends.

python -m venv .venv && source .venv/bin/activate
pip install -U pip

# Core CLI (data tools, config, project scaffolding)
pip install mlxsmith

# Apple Silicon training + serving
pip install "mlxsmith[mlx,llm,serve]"

# mlx-lm-lora passthrough (advanced training methods)
pip install "mlxsmith[lora]"

# Everything
pip install "mlxsmith[all]"

Quickstart

mlxsmith init myproj
cd myproj
mlxsmith doctor        # check Python, MLX, Metal

Training

SFT (LoRA/QLoRA)

mlxsmith sft --model cache/mlx/Qwen__Qwen3-4B-Instruct-2507 --data data/sft

Produces run artifacts under runs/sft_NNNN/ (adapter weights, metrics.jsonl, config snapshot).

Preference tuning (DPO/ORPO)

mlxsmith pref --model cache/mlx/Qwen__Qwen3-4B-Instruct-2507 \
  --data data/prefs --algo dpo

Supports DPO and ORPO algorithms with configurable beta and KL coefficients. Expects {prompt, chosen, rejected} data format.

Reinforced fine-tuning (GRPO)

mlxsmith rft --model cache/mlx/Qwen__Qwen3-4B-Instruct-2507 \
  --env envs/coding.yaml --verifier verifiers/pytest.py

GRPO-style RL training with token-level environment integration and verifier-based rewards. Rollout acceptance/rejection gating with reward tracking.

Knowledge distillation

# Offline distillation (teacher generates, student learns)
mlxsmith distill --teacher large-model --student small-model --mode offline

# Online preference distillation (OPD)
mlxsmith distill --teacher large-model --student small-model --mode opd

Full pipeline

# Run SFT → Pref → RFT in sequence
mlxsmith pipeline

mlx-lm-lora parity (all methods)

Use the passthrough to access mlx-lm-lora features (DPO variants, GRPO variants, PPO, synthetic datasets, judge training, etc.):

# Train with mlx-lm-lora directly
mlxsmith lora train --model Qwen/Qwen3-4B-Instruct-2507 --data data/prefs --train-mode dpo -- --beta 0.1

# Generate synthetic datasets
mlxsmith lora synthetic prompts -- --model mlx-community/Qwen3-4B-Instruct-2507-4bit --num-samples 1000

# Train judge model
mlxsmith lora judge -- --model mlx-community/Qwen3-4B-Instruct-2507-4bit --data data/prefs

Serving

OpenAI-compatible /v1/chat/completions endpoint.

mlxsmith serve --model runs/sft_0001/adapter --port 8080

curl http://localhost:8080/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{"messages":[{"role":"user","content":"Hello"}],"max_tokens":64}'

Supports streaming ("stream": true), logprobs, stop sequences, and an optional UI dashboard (serve.ui: true in config).

Data tools

mlxsmith data presets                                     # list built-in datasets
mlxsmith data pull alpaca                                 # pull a preset
mlxsmith data import raw.json --out data/sft/train.jsonl  # import ShareGPT → JSONL
mlxsmith data split data/sft/train.jsonl --fractions 0.9 0.05 0.05
mlxsmith data stats data/sft/train.jsonl                  # token counts, field analysis
mlxsmith data validate data/sft/train.jsonl               # structure check

Built-in presets: alpaca, hh-rlhf, ultrachat-200k, ultrafeedback-binarized-prefs, ultrafeedback-binarized-sft.

Model management

# Pull + convert HF model to MLX
mlxsmith pull Qwen/Qwen3-4B-Instruct-2507

# With quantization
mlxsmith pull Qwen/Qwen3-4B-Instruct-2507 --quantize --q-bits 4

# Merge adapters
mlxsmith adapters merge runs/sft_0001/adapter runs/pref_0001/adapter --weights 0.7 0.3

HF auth

mlxsmith auth login --token "$HF_TOKEN"
mlxsmith auth status
mlxsmith auth logout

Eval and bench

# Evaluation suite (pass@k with verifier checks)
mlxsmith eval --suite eval/suites/coding.yaml

# Benchmark inference or training throughput
mlxsmith bench --mode inference
mlxsmith bench --mode trainer
mlxsmith bench --mode end_to_end

Verifiers

Built-in verifiers for eval, RFT, and preference tuning:

regex — pattern matching on completions
jsonschema — JSON structure validation
pytest — sandboxed test execution
docker — containerized verification
compose — multi-verifier composition (AND/OR/weighted)
llm_judge — LLM-based self-verification / ThinkPRM-style verifier

See docs/VERIFIERS.md for the verifier API.

Environment plugin system

mlxsmith env list                  # list available environments
mlxsmith env info envs/coding.yaml # show manifest (tasks, verifier, version)
mlxsmith env init my_env           # scaffold a new environment
mlxsmith env install ./my_env      # install from directory
mlxsmith env package ./my_env      # create distributable tarball
mlxsmith env run envs/coding.yaml  # execute RFT with this environment

Environments define tasks, verifiers, and reward functions for RFT training. See docs/ENVIRONMENTS.md.

Config system

mlxsmith config show              # display merged config (YAML/JSON/TOML)
mlxsmith config show --sources    # show where each value comes from
mlxsmith config init              # create default mlxsmith.yaml
mlxsmith config validate          # check config structure
mlxsmith config env               # show environment variable mapping

Config sources (in priority order): CLI flags > environment variables (MLXSMITH__SECTION__KEY) > config file > defaults.

Training optimizers are configurable via train.optimizer and train.optimizer_kwargs (for example adamw, adam, qhadam, muon when available in MLX).

SDK (programmatic API)

For building custom training loops:

from mlxsmith.sdk import load_model, SamplingClient, TrainingClient, TrainingBatch

loaded = load_model("path/to/model", config)

# Sampling with logprobs
sampler = SamplingClient(loaded.backend)
result = sampler.sample("prompt", logprobs_k=5)

# Training operations
trainer = TrainingClient(loaded.backend)
trainer.create_optimizer(lr=1e-4, weight_decay=0.01)
fb = trainer.forward_backward(batch)
trainer.optim_step(fb.result().grads)

Loss functions: DPO, ORPO, GRPO, CISPO, DRO, PPO, importance sampling, cross-entropy.

Research

RLM self-play loop

RLM (Recursive Language Model) is a research feature — the infrastructure runs but has not produced measured gains yet.

mlxsmith rlm                       # single-process RLM
mlxsmith pipeline --orchestrated   # multi-process orchestrated RLM
mlxsmith rlm status                # check iteration state
mlxsmith rlm history               # view history

Includes task generation, mutation for data diversity, corpus management, EMA-based gating, and weight pointer IPC for multi-process coordination. See docs/orchestrator.md.

Docs

docs/PROJECT_FORMAT.md — project layout and artifacts
docs/VERIFIERS.md — verifier API and sandbox behavior
docs/COMPATIBILITY.md — tested versions and model families
docs/ENVIRONMENTS.md — environment plugin system
docs/orchestrator.md — multi-process RLM orchestrator
docs/rlm-ctl.md — RLM training guide
docs/ROADMAP.md — product direction and milestones
docs/README.md — full docs index

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

hmbown

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.9

Feb 5, 2026

0.1.8

Feb 4, 2026

0.1.7

Feb 3, 2026

This version

0.1.3

Feb 2, 2026

0.1.2

Feb 2, 2026

0.1.1

Feb 2, 2026

0.1.0

Feb 2, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlxsmith-0.1.3.tar.gz (143.1 kB view details)

Uploaded Feb 2, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mlxsmith-0.1.3-py3-none-any.whl (158.4 kB view details)

Uploaded Feb 2, 2026 Python 3

File details

Details for the file mlxsmith-0.1.3.tar.gz.

File metadata

Download URL: mlxsmith-0.1.3.tar.gz
Upload date: Feb 2, 2026
Size: 143.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mlxsmith-0.1.3.tar.gz
Algorithm	Hash digest
SHA256	`b45994d2b2da36ee56e6a098c3645eb875bd6f44e1f46ad3115e7ad640d1f3ee`
MD5	`2fa4328008b41c1c22be2f9d1c30a0df`
BLAKE2b-256	`e7f81afe40921761a0c47360fa36efc799354f01049d32164abec3602402161a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlxsmith-0.1.3.tar.gz:

Publisher: publish.yml on Hmbown/MLXSmith

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: mlxsmith-0.1.3.tar.gz
- Subject digest: b45994d2b2da36ee56e6a098c3645eb875bd6f44e1f46ad3115e7ad640d1f3ee
- Sigstore transparency entry: 906302369
- Sigstore integration time: Feb 2, 2026
Source repository:
- Permalink: Hmbown/MLXSmith@76bffe1cbf7f98f97cee4037d08124d5554f7104
- Branch / Tag: refs/tags/v0.1.3
- Owner: https://github.com/Hmbown
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@76bffe1cbf7f98f97cee4037d08124d5554f7104
- Trigger Event: release

File details

Details for the file mlxsmith-0.1.3-py3-none-any.whl.

File metadata

Download URL: mlxsmith-0.1.3-py3-none-any.whl
Upload date: Feb 2, 2026
Size: 158.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mlxsmith-0.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6353c11e048a63084da6eaff4268a140d11a3272572d7ba47af5a088e671fde0`
MD5	`8f56aca0482681da2d81a1f7fd44c78b`
BLAKE2b-256	`76a188d64ad63a30e51055b2e9d6cd7749880a15c0463e2bfee8a55300b9a375`

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlxsmith-0.1.3-py3-none-any.whl:

Publisher: publish.yml on Hmbown/MLXSmith

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: mlxsmith-0.1.3-py3-none-any.whl
- Subject digest: 6353c11e048a63084da6eaff4268a140d11a3272572d7ba47af5a088e671fde0
- Sigstore transparency entry: 906302428
- Sigstore integration time: Feb 2, 2026
Source repository:
- Permalink: Hmbown/MLXSmith@76bffe1cbf7f98f97cee4037d08124d5554f7104
- Branch / Tag: refs/tags/v0.1.3
- Owner: https://github.com/Hmbown
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@76bffe1cbf7f98f97cee4037d08124d5554f7104
- Trigger Event: release

mlxsmith 0.1.3

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

mlxsmith

Install

Quickstart

Training

SFT (LoRA/QLoRA)

Preference tuning (DPO/ORPO)

Reinforced fine-tuning (GRPO)

Knowledge distillation

Full pipeline

mlx-lm-lora parity (all methods)

Serving

Data tools

Model management

HF auth

Eval and bench

Verifiers

Environment plugin system

Config system

SDK (programmatic API)

Research

RLM self-play loop

Docs

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance