Skip to main content

A family of highly efficient, lightweight yet powerful optimizers.

Project description

Advanced Optimizers (AIO)

A comprehensive, all-in-one collection of state-of-the-art optimization algorithms for deep learning. Designed for maximum efficiency, minimal memory footprint, and superior performance across diverse model architectures and training scenarios.

PyPI version Python versions License


📦 Installation

pip install adv_optm

Requires PyTorch 2.3+ for torch.compile support.


What's New

🌟 Version 2.5.x: The Massive Refactor

This major update introduces a complete architectural refactor of the library:

🆕 New Optimizers & Scaling

  • SinkSGD_adv: Added a powerful new optimizer to the lineup.
  • Spectral Scaling: Now available across all optimizers, achieving width/rank invariant updates for highly stable training.

💾 Memory & State Precision Control

  • Granular State Precision (state_precision): Drastically reduce memory overhead with new optimizer state modes:
    • factored (Rank-2 factored mode)
    • fp32 (Full precision)
    • bf16_sr & int8_sr (BF16/Int8 with Stochastic Rounding)
  • Factored Second Moment (factored_2nd): Available for all Adam variants. Works seamlessly alongside any state_precision setting to further slash memory usage.

⚙️ Advanced Dynamics & Momentum

  • Variance Normalized Momentum (normed_momentum): Applies optimizer normalization before momentum (Normalization then Momentum/NtM). Available for AdamW_adv, SignSGD_adv, and SinkSGD_adv.
  • Universal Nesterov Momentum: Replaced the hard-to-tune Simplified_AdEMAMix with Nesterov momentum (nesterov) and a dedicated coefficient (nesterov_coef) across all optimizers.
  • Preconditioning & Signs:
    • Added Variance/Confidence Preconditioning (snr_cond) for SignSGD_adv and SinkSGD_adv (requires normed_momentum). Read the technical reports: AASS & sink-v.
    • Added Adaptive Stochastic Sign with $L_\infty$ preconditioning (stochastic_sign) for SignSGD_Adv and Lion_adv.
  • Improved CANS (accelerated_ns): Enhanced for Muon variants by integrating a dynamic lower bound.
  • New OrthoGrad modes (orthogonal_gradient): Standard OrthoGrad flattened and a new matrix-wise mode iterative.

⚓ Weight Decay Innovations

  • Centered Weight Decay (centered_wd): Pulls weights toward their pre-train state (anchor). To save memory, anchor precision (centered_wd_mode) can be set to full, float8, int8, or int4.
  • Fisher Weight Decay (fisher_wd): Now available for Adam variants based on the FAdam paper.
  • Geometric Weight Decay: Added specifically for SinkSGD_adv and SignSGD_adv.

(Note: Lion_Prodigy_adv, Simplified_AdEMAMix, and heuristic cautious/grams modes have been deprecated in favor of these superior, theoretically-grounded features).

Click to see older release notes (v1.2.x - v2.1.x)

Version 2.1.x

  • New Optimizer: Added Signum (SignSGD with momentum) to the SignSGD_adv family.

Version 2.0.x

  • torch.compile Support: Fully implemented for all advanced optimizers. Enable via compiled_optimizer=True to heavily fuse and optimize the optimizer step path.
  • 📉 1-Bit Factored Mode: Vastly improved implementation via nnmf_factor=True.
  • 🛠️ Broad performance and stability improvements across all optimizers.

Version 1.2.x

  • Advanced Muon Variants: Brought the groundbreaking Muon optimizer into the fold, enriched with features from recent literature.
Optimizer Description
Muon_adv Advanced Muon implementation featuring CANS, NorMuon, Low-Rank Orthogonalization, and more.
AdaMuon_adv Combines Muon's geometry with Adam-like adaptive scaling and sign-based orthogonalization.
  • Prodigy Speedup: Prodigy variants are now 50% faster by eliminating unnecessary CUDA syncs (Shoutout to @dxqb!).
  • Stochastic Rounding for BF16: Parameter updates and weight decay now accumulate in float32 and round once at the end.
  • Cautious Weight Decay: Implemented for all advanced optimizers (Paper).
  • Fused Operations: Transitioned to fused and in-place operations wherever possible.

💡 Core Innovations

(Documentation expanding on the theory and usage of these features is coming soon!)

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

adv_optm-2.5.6.tar.gz (61.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

adv_optm-2.5.6-py3-none-any.whl (85.7 kB view details)

Uploaded Python 3

File details

Details for the file adv_optm-2.5.6.tar.gz.

File metadata

  • Download URL: adv_optm-2.5.6.tar.gz
  • Upload date:
  • Size: 61.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for adv_optm-2.5.6.tar.gz
Algorithm Hash digest
SHA256 ac5dcc901827132d05469ee12926a1e91ce5424403dd5bdb1d18155467500929
MD5 5a1238d6e885d64fd93a88967a59eff9
BLAKE2b-256 16c4b2f4da85553418bd48149616710ed0102f965a9e301723d4ef78ee73180c

See more details on using hashes here.

File details

Details for the file adv_optm-2.5.6-py3-none-any.whl.

File metadata

  • Download URL: adv_optm-2.5.6-py3-none-any.whl
  • Upload date:
  • Size: 85.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for adv_optm-2.5.6-py3-none-any.whl
Algorithm Hash digest
SHA256 beb55125d4eab960b966316cfee70c66cbb1a27323b94e8294616d1b321cb9a1
MD5 918c82d73e9360b0ff6bd9378e5b732b
BLAKE2b-256 6e5f906d620e0ce12257405bfe33c25b4405931eb4ad2195112b7f9173bf03b7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page