Skip to main content

Research platform for language model experimentation on Apple Silicon

Project description

lmxlab

A research platform for language model experimentation on Apple Silicon.

CI Docs

Why lmxlab?

Most transformer implementations optimize for production at the cost of readability. lmxlab takes the opposite approach: every layer is implemented from scratch in MLX, with clarity that lets you quickly iterate on ideas and understand what each component does.

The core insight is that GPT, LLaMA, DeepSeek, Mamba, and dozens of other architectures are not fundamentally different models. They are different configurations of the same building blocks: attention, SSMs, feed-forward networks, normalization, and positional encoding. lmxlab makes this concrete by using config factories instead of class hierarchies.

from lmxlab.models.llama import llama_config
from lmxlab.models.deepseek import deepseek_config
from lmxlab.models.base import LanguageModel

# Same LanguageModel class, different configs
llama = LanguageModel(llama_config(d_model=512, n_heads=8, n_kv_heads=4, n_layers=6))
deepseek = LanguageModel(deepseek_config(d_model=512, n_heads=8, n_layers=6, kv_lora_rank=64))

No subclassing. No LlamaModel vs DeepSeekModel. One LanguageModel class, assembled from registry components based on what the config asks for.

What's included

  • 24 architectures as config factories: GPT, LLaMA, Gemma, Gemma 3 (sliding window), Qwen, Qwen 3 MoE, Qwen 3.5 (hybrid DeltaNet), Qwen-Next (gated attention), Mixtral (MoE), DeepSeek V2/V3 (MLA + MoE), Nemotron (hybrid Mamba-Transformer MoE), Llama 4 Scout/Maverick (iRoPE + chunked attention), Mistral Small (sliding window), OLMo 2 (QK-norm), GPT-OSS (QK-norm), Grok (SharedExpertMoE), Kimi K2.5 (DeltaNet + MoE), SmolLM3 (iRoPE), Falcon H1 (hybrid Mamba-2), Jamba (Mamba-2 + MoE), Bamba (hybrid Mamba-2), GLM-4.5 (MLA NoPE)
  • Building blocks: MHA, GQA, MLA, GatedGQA, SlidingWindowGQA, ChunkedGQA, SparseGQA (DSA), Mamba-2 SSD, Mamba-3 (trapezoidal), GatedDeltaNet, MoE, SharedExpertMoE, LatentMoE, QK-norm, SwiGLU, squared ReLU
  • Compiled training with mx.compile, functional gradients, gradient clipping, cosine schedules, dropout, muP parameterization
  • Advanced training: DPO, GRPO, multi-token prediction, curriculum learning, knowledge distillation
  • LoRA & QLoRA: parameter-efficient fine-tuning with optional 4-bit quantization
  • Inference: autoregressive generation, speculative decoding, best-of-N sampling, beam search, reward model scoring
  • HuggingFace integration: load pretrained weights from the Hub
  • Experiment framework: time/FLOP-budgeted runs, MLflow tracking, results logging, hyperparameter sweeps, MLX profiling
  • 35 recipe scripts: training, fine-tuning, DPO, GRPO, MTP, distillation, curriculum learning, DeltaNet hybrid, MoE, best-of-N sampling, evaluation, quantization, callbacks, optimizer comparison, KV cache analysis, experiment sweeps, benchmarking

Quick start

pip install lmxlab
import mlx.core as mx
from lmxlab.models.llama import llama_config
from lmxlab.models.base import LanguageModel
from lmxlab.training.config import TrainConfig
from lmxlab.training.trainer import Trainer

# Build a small LLaMA
config = llama_config(vocab_size=256, d_model=128, n_heads=4, n_kv_heads=2, n_layers=4)
model = LanguageModel(config)
mx.eval(model.parameters())
print(f"Parameters: {model.count_parameters():,}")

# Train
trainer = Trainer(model, TrainConfig(learning_rate=1e-3, max_steps=100))

See the Quickstart guide for a complete walkthrough.

Recipes

Ready-to-run scripts in recipes/:

uv run python recipes/train_tiny_gpt.py              # Train a tiny GPT
uv run python recipes/train_llama_shakespeare.py      # LLaMA on Shakespeare
uv run python recipes/compare_training.py             # Compare architectures
uv run python recipes/compare_architectures.py        # Side-by-side architecture comparison
uv run python recipes/ablation_gpt_to_llama.py        # Feature ablation study
uv run python recipes/finetune_lora.py --rank 8       # LoRA fine-tuning
uv run python recipes/finetune_qlora.py --bits 4      # QLoRA (4-bit + LoRA)
uv run python recipes/train_dpo.py                    # DPO preference optimization
uv run python recipes/train_grpo.py                   # GRPO reward optimization
uv run python recipes/train_curriculum.py              # Curriculum learning
uv run python recipes/train_mtp.py --n-predict 2      # Multi-token prediction
uv run python recipes/train_deltanet.py                # Hybrid DeltaNet vs GQA
uv run python recipes/train_moe.py --experts 4        # Mixture of Experts
uv run python recipes/advanced_sampling.py             # Best-of-N and majority vote
uv run python recipes/speculative_decoding.py         # Draft-then-verify generation
uv run python recipes/evaluate_model.py               # Evaluate with perplexity/BPB
uv run python recipes/interactive_generate.py         # Streaming token-by-token generation
uv run python recipes/checkpoint_resume.py            # Save and resume training
uv run python recipes/run_experiment.py               # Structured experiment with logging
uv run python recipes/sweep_learning_rate.py          # Hyperparameter sweep
uv run python recipes/load_pretrained.py              # Load HuggingFace model
uv run python recipes/profile_models.py               # Architecture profiling
uv run python recipes/benchmark_compile.py            # mx.compile speedup benchmark
uv run python recipes/distill_model.py                # Knowledge distillation
uv run python recipes/quantize_and_generate.py        # 4-bit/8-bit quantization
uv run python recipes/train_with_callbacks.py         # Logging, throughput, early stopping
uv run python recipes/train_with_datasets.py          # TextDataset vs TokenDataset
uv run python recipes/compare_schedules.py            # LR schedules and optimizers
uv run python recipes/compare_optimizers.py           # Optimizer comparison (Experiment 3)
uv run python recipes/compare_kv_cache.py             # MLA vs GQA KV cache (Experiment 4)
uv run python recipes/analyze_experiments.py          # Statistical analysis tools

CLI

lmxlab list                    # List all architectures
lmxlab info llama --tiny       # Show config details
lmxlab count deepseek --detail # Parameter breakdown

Design principles

  • Clarity for rapid iteration. Code is written to be read and modified quickly, not for maximum production performance.
  • MLX-native. Uses MLX idioms directly: nn.value_and_grad, mx.compile, unified memory.
  • Config factories, not subclasses. Architecture variants are configs, not class hierarchies.
  • Progressive complexity. Start with GPT-style, swap in LLaMA-style, then try MLA or Mamba. Same model class throughout.
  • Reproducible experiments. Time/FLOP budgets, train/val splits, MLflow tracking, and structured results logging.

Requirements

  • Python 3.12+
  • Apple Silicon Mac (M1 or later) for GPU acceleration
  • MLX will also run on Intel Macs and Linux using CPU

Development

git clone https://github.com/michaelellis003/lmxlab.git
cd lmxlab
uv sync --extra dev
uv run pre-commit install
uv run pre-commit install --hook-type commit-msg

# Run tests
uv run pytest

# Lint
uv run ruff check src/ tests/ recipes/

# Build docs
uv run mkdocs serve

Documentation

Full documentation at michaelellis003.github.io/lmxlab.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lmxlab-0.3.0.tar.gz (537.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lmxlab-0.3.0-py3-none-any.whl (142.4 kB view details)

Uploaded Python 3

File details

Details for the file lmxlab-0.3.0.tar.gz.

File metadata

  • Download URL: lmxlab-0.3.0.tar.gz
  • Upload date:
  • Size: 537.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for lmxlab-0.3.0.tar.gz
Algorithm Hash digest
SHA256 79899933b0fb41dde6a23ba7bce74ff1cff1fc645aca71f65e268b6b4f37e7a4
MD5 87b0e6b4fc3257a5b76139cc9e4c04ab
BLAKE2b-256 c40b74f04704d4cfc54d9d70ba54d0f88cc82d259c056de2ccc55fdfccc30f36

See more details on using hashes here.

Provenance

The following attestation bundles were made for lmxlab-0.3.0.tar.gz:

Publisher: publish.yml on michaelellis003/lmxlab

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lmxlab-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: lmxlab-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 142.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for lmxlab-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 57597373276dec06df07ea6a21ac36f635a4b69e9b1e810268959fb340ae580e
MD5 e6d5d9b10b39c1bf60a97176db126497
BLAKE2b-256 c0ad5ba0844b139fdaefedec3579782dccce194450ec7cd9f0b0d8d9782570ca

See more details on using hashes here.

Provenance

The following attestation bundles were made for lmxlab-0.3.0-py3-none-any.whl:

Publisher: publish.yml on michaelellis003/lmxlab

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page