A performance-optimized Muon optimizer with foreach support, auto-routing, and composite optimizer patterns.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

emaballarin

These details have not been verified by PyPI

Project description

optimuon

A performance-optimized Muon optimizer for PyTorch.

Features:

Foreach-native: uses torch._foreach_* ops for momentum, weight decay, and parameter updates.
Batched Newton-Schulz: groups matrices by shape for parallel orthogonalization.
Auto-parameter routing: automatically partitions model parameters into Muon-eligible (≥2D hidden weights) and auxiliary (embeddings, heads, norms, biases).
Composite optimizer: CompositeMuon combines Muon with any arbitrary auxiliary optimizer (not just AdamW).
Three LR modes: Keller Jordan's "original" (with aspect-ratio scaling), Moonshot AI's "match_rms_adamw", and "none" (no scaling).
Momentum conventions: "ema" (m = beta*m + (1-beta)*g, default) and "classical" (m = beta*m + g).
Corrections: MARS, cautious updates, cautious weight decay, NorMuon, gradient/update clipping (all toggleable).
Weight normalization: optional Frobenius-norm clamping to sqrt(fan_out) (from KJ's original Muon).
Half-precision momentum: optional lower-precision momentum buffers for memory savings.
Polar Express: optimal per-step Newton-Schulz coefficients (default).
Distributed: torch.distributed gradient sharding via all_gather.

Installation

uv pip install git+https://github.com/emaballarin/optimuon

Quick start

Standalone Muon (manual parameter selection)

from optimuon import Muon

# Muon for ≥2D hidden weight matrices only
muon = Muon(muon_params, lr=0.02, momentum=0.95, weight_decay=0.01)

# Separate AdamW for everything else
import torch
adamw = torch.optim.AdamW(other_params, lr=3e-4)

# Training loop
for batch in dataloader:
    loss = model(batch).loss
    loss.backward()
    muon.step()
    adamw.step()
    muon.zero_grad()
    adamw.zero_grad()

CompositeMuon with auto-routing (recommended)

from optimuon import CompositeMuon

optimizer = CompositeMuon(
    model,
    muon_lr=0.02,
    muon_kwargs={"weight_decay": 0.01, "foreach": True},
    aux_optimizer_class=torch.optim.AdamW,
    aux_optimizer_kwargs={"lr": 3e-4, "betas": (0.9, 0.95), "weight_decay": 0.01},
    verbose=True,
)

for batch in dataloader:
    loss = model(batch).loss
    loss.backward()
    optimizer.step()
    optimizer.zero_grad()

With corrections

from optimuon import CompositeMuon

optimizer = CompositeMuon(
    model,
    muon_lr=0.02,
    muon_kwargs={
        "weight_decay": 0.01,
        "mars": True,           # MARS gradient correction
        "cautious": True,       # cautious update masking
        "grad_clip": 1.0,       # gradient norm clipping
        "weight_norm": True,    # Frobenius-norm clamping
    },
    aux_optimizer_class=torch.optim.AdamW,
    aux_optimizer_kwargs={"lr": 3e-4},
)

With a custom auxiliary optimizer

from optimuon import CompositeMuon

optimizer = CompositeMuon(
    model,
    muon_lr=0.02,
    aux_optimizer_factory=lambda param_groups: SomeExoticOptimizer(param_groups, lr=1e-3),
)

Manual routing utilities

from optimuon import partition_params

result = partition_params(model)
print(f"Muon: {result.muon_names}")
print(f"Aux:  {result.aux_names}")

References

Keller Jordan et al., Muon: An optimizer for hidden layers in neural networks (2024)
Huizhuo Yuan et al., MARS: Unleashing the Power of Variance Reduction for Training Large Models (2024)
Kaizhao Liang et al., Cautious Optimizers: Improving Training with One Line of Code (2024)
Moonshot AI, Muon is Scalable for LLM Training (2025)
Essential AI, Practical Efficiency of Muon for Pretraining (2025)
Noah Amsel et al., The Polar Express: Optimal Matrix Sign Methods and Their Application to the Muon Algorithm (2025)
Zichong Li et al., NorMuon: Making Muon more efficient and scalable (2025)
Lizhang Chen et al., Cautious Weight Decay (2025)

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

emaballarin

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.1

Apr 1, 2026

0.1.0

Mar 23, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

optimuon-0.1.1.tar.gz (18.8 kB view details)

Uploaded Apr 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

optimuon-0.1.1-py3-none-any.whl (21.2 kB view details)

Uploaded Apr 1, 2026 Python 3

File details

Details for the file optimuon-0.1.1.tar.gz.

File metadata

Download URL: optimuon-0.1.1.tar.gz
Upload date: Apr 1, 2026
Size: 18.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for optimuon-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`70501ad618fa27a0907d140c539d038ee26b34094e69d8061430d708eca1fb3f`
MD5	`7958c6910c848ffe95ef7ba8c508c25b`
BLAKE2b-256	`3e5aac275fff4b023924ecf6b5799b68064d08de730868d579b0b659cff6213d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for optimuon-0.1.1.tar.gz:

Publisher: pypipublish.yml on emaballarin/optimuon

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: optimuon-0.1.1.tar.gz
- Subject digest: 70501ad618fa27a0907d140c539d038ee26b34094e69d8061430d708eca1fb3f
- Sigstore transparency entry: 1204200022
- Sigstore integration time: Apr 1, 2026
Source repository:
- Permalink: emaballarin/optimuon@a8253e9fefe14574bbda0a13b136fe506b612c42
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/emaballarin
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypipublish.yml@a8253e9fefe14574bbda0a13b136fe506b612c42
- Trigger Event: push

File details

Details for the file optimuon-0.1.1-py3-none-any.whl.

File metadata

Download URL: optimuon-0.1.1-py3-none-any.whl
Upload date: Apr 1, 2026
Size: 21.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for optimuon-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f574251affc7a44fa71b39e27d3466f5b1143c88e02fe05c398d5cc410727c14`
MD5	`ec2125ae947746bba2b56961f1b09c1e`
BLAKE2b-256	`2c2228bc6b3d6400a7d4755e28d3706a93bef334fc594bd25942c1b33d2952f6`

See more details on using hashes here.

Provenance

The following attestation bundles were made for optimuon-0.1.1-py3-none-any.whl:

Publisher: pypipublish.yml on emaballarin/optimuon

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: optimuon-0.1.1-py3-none-any.whl
- Subject digest: f574251affc7a44fa71b39e27d3466f5b1143c88e02fe05c398d5cc410727c14
- Sigstore transparency entry: 1204200108
- Sigstore integration time: Apr 1, 2026
Source repository:
- Permalink: emaballarin/optimuon@a8253e9fefe14574bbda0a13b136fe506b612c42
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/emaballarin
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypipublish.yml@a8253e9fefe14574bbda0a13b136fe506b612c42
- Trigger Event: push

optimuon 0.1.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Classifiers

Project description

optimuon

Installation

Quick start

Standalone Muon (manual parameter selection)

CompositeMuon with auto-routing (recommended)

With corrections

With a custom auxiliary optimizer

Manual routing utilities

References

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance