Skip to main content

A performance-optimized Muon optimizer with foreach support, auto-routing, and composite optimizer patterns.

Project description

optimuon

A performance-optimized Muon optimizer for PyTorch.

Features:

  • Foreach-native: uses torch._foreach_* ops for momentum, weight decay, and parameter updates.
  • Batched Newton-Schulz: groups matrices by shape for parallel orthogonalization.
  • Auto-parameter routing: automatically partitions model parameters into Muon-eligible (≥2D hidden weights) and auxiliary (embeddings, heads, norms, biases).
  • Composite optimizer: CompositeMuon combines Muon with any arbitrary auxiliary optimizer (not just AdamW).
  • Three LR modes: Keller Jordan's "original" (with aspect-ratio scaling), Moonshot AI's "match_rms_adamw", and "none" (no scaling).
  • Momentum conventions: "ema" (m = beta*m + (1-beta)*g, default) and "classical" (m = beta*m + g).
  • Corrections: MARS, cautious updates, cautious weight decay, NorMuon, gradient/update clipping (all toggleable).
  • Weight normalization: optional Frobenius-norm clamping to sqrt(fan_out) (from KJ's original Muon).
  • Half-precision momentum: optional lower-precision momentum buffers for memory savings.
  • Polar Express: optimal per-step Newton-Schulz coefficients (default).
  • Distributed: torch.distributed gradient sharding via all_gather.

Installation

uv pip install git+https://github.com/emaballarin/optimuon

Quick start

Standalone Muon (manual parameter selection)

from optimuon import Muon

# Muon for ≥2D hidden weight matrices only
muon = Muon(muon_params, lr=0.02, momentum=0.95, weight_decay=0.01)

# Separate AdamW for everything else
import torch
adamw = torch.optim.AdamW(other_params, lr=3e-4)

# Training loop
for batch in dataloader:
    loss = model(batch).loss
    loss.backward()
    muon.step()
    adamw.step()
    muon.zero_grad()
    adamw.zero_grad()

CompositeMuon with auto-routing (recommended)

from optimuon import CompositeMuon

optimizer = CompositeMuon(
    model,
    muon_lr=0.02,
    muon_kwargs={"weight_decay": 0.01, "foreach": True},
    aux_optimizer_class=torch.optim.AdamW,
    aux_optimizer_kwargs={"lr": 3e-4, "betas": (0.9, 0.95), "weight_decay": 0.01},
    verbose=True,
)

for batch in dataloader:
    loss = model(batch).loss
    loss.backward()
    optimizer.step()
    optimizer.zero_grad()

With corrections

from optimuon import CompositeMuon

optimizer = CompositeMuon(
    model,
    muon_lr=0.02,
    muon_kwargs={
        "weight_decay": 0.01,
        "mars": True,           # MARS gradient correction
        "cautious": True,       # cautious update masking
        "grad_clip": 1.0,       # gradient norm clipping
        "weight_norm": True,    # Frobenius-norm clamping
    },
    aux_optimizer_class=torch.optim.AdamW,
    aux_optimizer_kwargs={"lr": 3e-4},
)

With a custom auxiliary optimizer

from optimuon import CompositeMuon

optimizer = CompositeMuon(
    model,
    muon_lr=0.02,
    aux_optimizer_factory=lambda param_groups: SomeExoticOptimizer(param_groups, lr=1e-3),
)

Manual routing utilities

from optimuon import partition_params

result = partition_params(model)
print(f"Muon: {result.muon_names}")
print(f"Aux:  {result.aux_names}")

References

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

optimuon-0.1.1.tar.gz (18.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

optimuon-0.1.1-py3-none-any.whl (21.2 kB view details)

Uploaded Python 3

File details

Details for the file optimuon-0.1.1.tar.gz.

File metadata

  • Download URL: optimuon-0.1.1.tar.gz
  • Upload date:
  • Size: 18.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for optimuon-0.1.1.tar.gz
Algorithm Hash digest
SHA256 70501ad618fa27a0907d140c539d038ee26b34094e69d8061430d708eca1fb3f
MD5 7958c6910c848ffe95ef7ba8c508c25b
BLAKE2b-256 3e5aac275fff4b023924ecf6b5799b68064d08de730868d579b0b659cff6213d

See more details on using hashes here.

Provenance

The following attestation bundles were made for optimuon-0.1.1.tar.gz:

Publisher: pypipublish.yml on emaballarin/optimuon

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file optimuon-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: optimuon-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 21.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for optimuon-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f574251affc7a44fa71b39e27d3466f5b1143c88e02fe05c398d5cc410727c14
MD5 ec2125ae947746bba2b56961f1b09c1e
BLAKE2b-256 2c2228bc6b3d6400a7d4755e28d3706a93bef334fc594bd25942c1b33d2952f6

See more details on using hashes here.

Provenance

The following attestation bundles were made for optimuon-0.1.1-py3-none-any.whl:

Publisher: pypipublish.yml on emaballarin/optimuon

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page