Skip to main content

A family of highly efficient, lightweight yet powerful optimizers.

Project description

Advanced Optimizers (AIO)

A comprehensive, all-in-one collection of state-of-the-art optimization algorithms for deep learning. Designed for maximum efficiency, minimal memory footprint, and superior performance across diverse model architectures and training scenarios.

PyPI version Python versions License


📦 Installation

pip install adv_optm

Requires PyTorch 2.3+ for torch.compile support.


What's New

🌟 Version 2.5.x: The Massive Refactor

This major update introduces a complete architectural refactor of the library:

🆕 New Optimizers & Scaling

  • SinkSGD_adv: Added a powerful new optimizer to the lineup.
  • Spectral Scaling: Now available across all optimizers, achieving width/rank invariant updates for highly stable training.

💾 Memory & State Precision Control

  • Granular State Precision (state_precision): Drastically reduce memory overhead with new optimizer state modes:
    • factored (Rank-2 factored mode)
    • fp32 (Full precision)
    • bf16_sr & int8_sr (BF16/Int8 with Stochastic Rounding)
  • Factored Second Moment (factored_2nd): Available for all Adam variants. Works seamlessly alongside any state_precision setting to further slash memory usage.

⚙️ Advanced Dynamics & Momentum

  • Variance Normalized Momentum (normed_momentum): Applies optimizer normalization before momentum (Normalization then Momentum/NtM). Available for AdamW_adv, SignSGD_adv, and SinkSGD_adv.
  • Universal Nesterov Momentum: Replaced the hard-to-tune Simplified_AdEMAMix with Nesterov momentum (nesterov) and a dedicated coefficient (nesterov_coef) across all optimizers.
  • Preconditioning & Signs:
    • Added Variance/Confidence Preconditioning (snr_cond) for SignSGD_adv and SinkSGD_adv (requires normed_momentum). Read the technical reports: AASS & sink-v.
    • Added Adaptive Stochastic Sign with $L_\infty$ preconditioning (stochastic_sign) for SignSGD_Adv and Lion_adv.
  • Improved CANS (accelerated_ns): Enhanced for Muon variants by integrating a dynamic lower bound.
  • New OrthoGrad modes (orthogonal_gradient): Standard OrthoGrad flattened and a new matrix-wise mode iterative.

⚓ Weight Decay Innovations

  • Centered Weight Decay (centered_wd): Pulls weights toward their pre-train state (anchor). To save memory, anchor precision (centered_wd_mode) can be set to full, float8, int8, or int4.
  • Fisher Weight Decay (fisher_wd): Now available for Adam variants based on the FAdam paper.
  • Geometric Weight Decay: Added specifically for SinkSGD_adv and SignSGD_adv.

(Note: Lion_Prodigy_adv, Simplified_AdEMAMix, and heuristic cautious/grams modes have been deprecated in favor of these superior, theoretically-grounded features).

Click to see older release notes (v1.2.x - v2.1.x)

Version 2.1.x

  • New Optimizer: Added Signum (SignSGD with momentum) to the SignSGD_adv family.

Version 2.0.x

  • torch.compile Support: Fully implemented for all advanced optimizers. Enable via compiled_optimizer=True to heavily fuse and optimize the optimizer step path.
  • 📉 1-Bit Factored Mode: Vastly improved implementation via nnmf_factor=True.
  • 🛠️ Broad performance and stability improvements across all optimizers.

Version 1.2.x

  • Advanced Muon Variants: Brought the groundbreaking Muon optimizer into the fold, enriched with features from recent literature.
Optimizer Description
Muon_adv Advanced Muon implementation featuring CANS, NorMuon, Low-Rank Orthogonalization, and more.
AdaMuon_adv Combines Muon's geometry with Adam-like adaptive scaling and sign-based orthogonalization.
  • Prodigy Speedup: Prodigy variants are now 50% faster by eliminating unnecessary CUDA syncs (Shoutout to @dxqb!).
  • Stochastic Rounding for BF16: Parameter updates and weight decay now accumulate in float32 and round once at the end.
  • Cautious Weight Decay: Implemented for all advanced optimizers (Paper).
  • Fused Operations: Transitioned to fused and in-place operations wherever possible.

💡 Core Innovations

(Documentation expanding on the theory and usage of these features is coming soon!)

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

adv_optm-2.6.dev1.tar.gz (62.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

adv_optm-2.6.dev1-py3-none-any.whl (87.9 kB view details)

Uploaded Python 3

File details

Details for the file adv_optm-2.6.dev1.tar.gz.

File metadata

  • Download URL: adv_optm-2.6.dev1.tar.gz
  • Upload date:
  • Size: 62.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for adv_optm-2.6.dev1.tar.gz
Algorithm Hash digest
SHA256 afba0b1e31ca5823b974ef6a625ef3a39013c54abb4cdd723f0e8f0ecf0ac51c
MD5 208d0d29261a2127d8ba106b5c4a1e7c
BLAKE2b-256 eeef186a2638ce76f7056caa821f0d50bfa447fcda19679acb2a1a41a46684ce

See more details on using hashes here.

File details

Details for the file adv_optm-2.6.dev1-py3-none-any.whl.

File metadata

  • Download URL: adv_optm-2.6.dev1-py3-none-any.whl
  • Upload date:
  • Size: 87.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for adv_optm-2.6.dev1-py3-none-any.whl
Algorithm Hash digest
SHA256 4b1c1842e5aaf990b96c1e164fd9e5fb50abb0630f1a0b99bc95575c2d415549
MD5 073f841418aa9b7a802c160c08d3e66f
BLAKE2b-256 fb318f25c48e6dea247ea51b0ed054717ae3fb0f6da5f04cc854de559fd997fa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page