Research platform for language model experimentation on Apple Silicon
Project description
lmxlab
A research platform for language model experimentation on Apple Silicon.
Why lmxlab?
Most transformer implementations optimize for production at the cost of readability. lmxlab takes the opposite approach: every layer is implemented from scratch in MLX, with clarity that lets you quickly iterate on ideas and understand what each component does.
The core insight is that GPT, LLaMA, DeepSeek, Mamba, and dozens of other architectures are not fundamentally different models. They are different configurations of the same building blocks: attention, SSMs, feed-forward networks, normalization, and positional encoding. lmxlab makes this concrete by using config factories instead of class hierarchies.
from lmxlab.models.llama import llama_config
from lmxlab.models.deepseek import deepseek_config
from lmxlab.models.base import LanguageModel
# Same LanguageModel class, different configs
llama = LanguageModel(llama_config(d_model=512, n_heads=8, n_kv_heads=4, n_layers=6))
deepseek = LanguageModel(deepseek_config(d_model=512, n_heads=8, n_layers=6, kv_lora_rank=64))
No subclassing. No LlamaModel vs DeepSeekModel. One LanguageModel class,
assembled from registry components based on what the config asks for.
What's included
- 24 architectures as config factories: GPT, LLaMA, Gemma, Gemma 3 (sliding window), Qwen, Qwen 3 MoE, Qwen 3.5 (hybrid DeltaNet), Qwen-Next (gated attention), Mixtral (MoE), DeepSeek V2/V3 (MLA + MoE), Nemotron (hybrid Mamba-Transformer MoE), Llama 4 Scout/Maverick (iRoPE + chunked attention), Mistral Small (sliding window), OLMo 2 (QK-norm), GPT-OSS (QK-norm), Grok (SharedExpertMoE), Kimi K2.5 (DeltaNet + MoE), SmolLM3 (iRoPE), Falcon H1 (hybrid Mamba-2), Jamba (Mamba-2 + MoE), Bamba (hybrid Mamba-2), GLM-4.5 (MLA NoPE)
- Building blocks: MHA, GQA, MLA, GatedGQA, SlidingWindowGQA, ChunkedGQA, SparseGQA (DSA), Mamba-2 SSD, Mamba-3 (trapezoidal), GatedDeltaNet, MoE, SharedExpertMoE, LatentMoE, QK-norm, SwiGLU, squared ReLU
- Compiled training with
mx.compile, functional gradients, gradient clipping, cosine schedules, dropout, muP parameterization - Advanced training: DPO, GRPO, multi-token prediction, curriculum learning, knowledge distillation
- LoRA & QLoRA: parameter-efficient fine-tuning with optional 4-bit quantization
- Inference: autoregressive generation, speculative decoding, best-of-N sampling, beam search, reward model scoring
- HuggingFace integration: load pretrained weights from the Hub
- Experiment framework: time/FLOP-budgeted runs, MLflow tracking, results logging, hyperparameter sweeps, MLX profiling
- 35 recipe scripts: training, fine-tuning, DPO, GRPO, MTP, distillation, curriculum learning, DeltaNet hybrid, MoE, best-of-N sampling, evaluation, quantization, callbacks, optimizer comparison, KV cache analysis, experiment sweeps, benchmarking
Quick start
pip install lmxlab
import mlx.core as mx
from lmxlab.models.llama import llama_config
from lmxlab.models.base import LanguageModel
from lmxlab.training.config import TrainConfig
from lmxlab.training.trainer import Trainer
# Build a small LLaMA
config = llama_config(vocab_size=256, d_model=128, n_heads=4, n_kv_heads=2, n_layers=4)
model = LanguageModel(config)
mx.eval(model.parameters())
print(f"Parameters: {model.count_parameters():,}")
# Train
trainer = Trainer(model, TrainConfig(learning_rate=1e-3, max_steps=100))
See the Quickstart guide for a complete walkthrough.
Recipes
Ready-to-run scripts in recipes/:
uv run python recipes/train_tiny_gpt.py # Train a tiny GPT
uv run python recipes/train_llama_shakespeare.py # LLaMA on Shakespeare
uv run python recipes/compare_training.py # Compare architectures
uv run python recipes/compare_architectures.py # Side-by-side architecture comparison
uv run python recipes/ablation_gpt_to_llama.py # Feature ablation study
uv run python recipes/finetune_lora.py --rank 8 # LoRA fine-tuning
uv run python recipes/finetune_qlora.py --bits 4 # QLoRA (4-bit + LoRA)
uv run python recipes/train_dpo.py # DPO preference optimization
uv run python recipes/train_grpo.py # GRPO reward optimization
uv run python recipes/train_curriculum.py # Curriculum learning
uv run python recipes/train_mtp.py --n-predict 2 # Multi-token prediction
uv run python recipes/train_deltanet.py # Hybrid DeltaNet vs GQA
uv run python recipes/train_moe.py --experts 4 # Mixture of Experts
uv run python recipes/advanced_sampling.py # Best-of-N and majority vote
uv run python recipes/speculative_decoding.py # Draft-then-verify generation
uv run python recipes/evaluate_model.py # Evaluate with perplexity/BPB
uv run python recipes/interactive_generate.py # Streaming token-by-token generation
uv run python recipes/checkpoint_resume.py # Save and resume training
uv run python recipes/run_experiment.py # Structured experiment with logging
uv run python recipes/sweep_learning_rate.py # Hyperparameter sweep
uv run python recipes/load_pretrained.py # Load HuggingFace model
uv run python recipes/profile_models.py # Architecture profiling
uv run python recipes/benchmark_compile.py # mx.compile speedup benchmark
uv run python recipes/distill_model.py # Knowledge distillation
uv run python recipes/quantize_and_generate.py # 4-bit/8-bit quantization
uv run python recipes/train_with_callbacks.py # Logging, throughput, early stopping
uv run python recipes/train_with_datasets.py # TextDataset vs TokenDataset
uv run python recipes/compare_schedules.py # LR schedules and optimizers
uv run python recipes/compare_optimizers.py # Optimizer comparison (Experiment 3)
uv run python recipes/compare_kv_cache.py # MLA vs GQA KV cache (Experiment 4)
uv run python recipes/analyze_experiments.py # Statistical analysis tools
CLI
lmxlab list # List all architectures
lmxlab info llama --tiny # Show config details
lmxlab count deepseek --detail # Parameter breakdown
Design principles
- Clarity for rapid iteration. Code is written to be read and modified quickly, not for maximum production performance.
- MLX-native. Uses MLX idioms directly:
nn.value_and_grad,mx.compile, unified memory. - Config factories, not subclasses. Architecture variants are configs, not class hierarchies.
- Progressive complexity. Start with GPT-style, swap in LLaMA-style, then try MLA or Mamba. Same model class throughout.
- Reproducible experiments. Time/FLOP budgets, train/val splits, MLflow tracking, and structured results logging.
Requirements
- Python 3.12+
- Apple Silicon Mac (M1 or later) for GPU acceleration
- MLX will also run on Intel Macs and Linux using CPU
Development
git clone https://github.com/michaelellis003/lmxlab.git
cd lmxlab
uv sync --extra dev
uv run pre-commit install
uv run pre-commit install --hook-type commit-msg
# Run tests
uv run pytest
# Lint
uv run ruff check src/ tests/ recipes/
# Build docs
uv run mkdocs serve
Documentation
Full documentation at michaelellis003.github.io/lmxlab.
- Quickstart
- Architecture Overview
- MLX Idioms
- Models Comparison
- Data Pipeline
- Training
- Inference
- Recipes
- Production Optimizations
- Experiment Methodology
- Developer Log
- API Reference
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lmxlab-0.3.0.tar.gz.
File metadata
- Download URL: lmxlab-0.3.0.tar.gz
- Upload date:
- Size: 537.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
79899933b0fb41dde6a23ba7bce74ff1cff1fc645aca71f65e268b6b4f37e7a4
|
|
| MD5 |
87b0e6b4fc3257a5b76139cc9e4c04ab
|
|
| BLAKE2b-256 |
c40b74f04704d4cfc54d9d70ba54d0f88cc82d259c056de2ccc55fdfccc30f36
|
Provenance
The following attestation bundles were made for lmxlab-0.3.0.tar.gz:
Publisher:
publish.yml on michaelellis003/lmxlab
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
lmxlab-0.3.0.tar.gz -
Subject digest:
79899933b0fb41dde6a23ba7bce74ff1cff1fc645aca71f65e268b6b4f37e7a4 - Sigstore transparency entry: 1108152552
- Sigstore integration time:
-
Permalink:
michaelellis003/lmxlab@9458bddda6394de0b553aeb19882face0a269621 -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/michaelellis003
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@9458bddda6394de0b553aeb19882face0a269621 -
Trigger Event:
release
-
Statement type:
File details
Details for the file lmxlab-0.3.0-py3-none-any.whl.
File metadata
- Download URL: lmxlab-0.3.0-py3-none-any.whl
- Upload date:
- Size: 142.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
57597373276dec06df07ea6a21ac36f635a4b69e9b1e810268959fb340ae580e
|
|
| MD5 |
e6d5d9b10b39c1bf60a97176db126497
|
|
| BLAKE2b-256 |
c0ad5ba0844b139fdaefedec3579782dccce194450ec7cd9f0b0d8d9782570ca
|
Provenance
The following attestation bundles were made for lmxlab-0.3.0-py3-none-any.whl:
Publisher:
publish.yml on michaelellis003/lmxlab
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
lmxlab-0.3.0-py3-none-any.whl -
Subject digest:
57597373276dec06df07ea6a21ac36f635a4b69e9b1e810268959fb340ae580e - Sigstore transparency entry: 1108152556
- Sigstore integration time:
-
Permalink:
michaelellis003/lmxlab@9458bddda6394de0b553aeb19882face0a269621 -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/michaelellis003
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@9458bddda6394de0b553aeb19882face0a269621 -
Trigger Event:
release
-
Statement type: