Skip to main content

A family of highly efficient, lightweight yet powerful optimizers.

Project description

Advanced Optimizers (AIO)

A comprehensive, all-in-one collection of state-of-the-art optimization algorithms for deep learning. Designed for maximum efficiency, minimal memory footprint, and superior performance across diverse model architectures and training scenarios.

PyPI version Python versions License


📦 Installation

pip install adv_optm

Requires PyTorch 2.3+ for torch.compile support.


What's New

🌟 Version 2.5.x: The Massive Refactor

This major update introduces a complete architectural refactor of the library:

🆕 New Optimizers & Scaling

  • SinkSGD_adv: Added a powerful new optimizer to the lineup.
  • Spectral Scaling: Now available across all optimizers, achieving width/rank invariant updates for highly stable training.

💾 Memory & State Precision Control

  • Granular State Precision (state_precision): Drastically reduce memory overhead with new optimizer state modes:
    • factored (Rank-2 factored mode)
    • fp32 (Full precision)
    • bf16_sr & int8_sr (BF16/Int8 with Stochastic Rounding)
  • Factored Second Moment (factored_2nd): Available for all Adam variants. Works seamlessly alongside any state_precision setting to further slash memory usage.

⚙️ Advanced Dynamics & Momentum

  • Variance Normalized Momentum (normed_momentum): Applies optimizer normalization before momentum (Normalization then Momentum/NtM). Available for AdamW_adv, SignSGD_adv, and SinkSGD_adv.
  • Universal Nesterov Momentum: Replaced the hard-to-tune Simplified_AdEMAMix with Nesterov momentum (nesterov) and a dedicated coefficient (nesterov_coef) across all optimizers.
  • Preconditioning & Signs:
    • Added Variance/Confidence Preconditioning (snr_cond) for SignSGD_adv and SinkSGD_adv (requires normed_momentum). Read the technical reports: AASS & sink-v.
    • Added Adaptive Stochastic Sign with $L_\infty$ preconditioning (stochastic_sign) for SignSGD_Adv and Lion_adv.
  • Improved CANS (accelerated_ns): Enhanced for Muon variants by integrating a dynamic lower bound.
  • New OrthoGrad modes (orthogonal_gradient): Standard OrthoGrad flattened and a new matrix-wise mode iterative.

⚓ Weight Decay Innovations

  • Centered Weight Decay (centered_wd): Pulls weights toward their pre-train state (anchor). To save memory, anchor precision (centered_wd_mode) can be set to full, float8, int8, or int4.
  • Fisher Weight Decay (fisher_wd): Now available for Adam variants based on the FAdam paper.
  • Geometric Weight Decay: Added specifically for SinkSGD_adv and SignSGD_adv.

(Note: Lion_Prodigy_adv, Simplified_AdEMAMix, and heuristic cautious/grams modes have been deprecated in favor of these superior, theoretically-grounded features).

Click to see older release notes (v1.2.x - v2.1.x)

Version 2.1.x

  • New Optimizer: Added Signum (SignSGD with momentum) to the SignSGD_adv family.

Version 2.0.x

  • torch.compile Support: Fully implemented for all advanced optimizers. Enable via compiled_optimizer=True to heavily fuse and optimize the optimizer step path.
  • 📉 1-Bit Factored Mode: Vastly improved implementation via nnmf_factor=True.
  • 🛠️ Broad performance and stability improvements across all optimizers.

Version 1.2.x

  • Advanced Muon Variants: Brought the groundbreaking Muon optimizer into the fold, enriched with features from recent literature.
Optimizer Description
Muon_adv Advanced Muon implementation featuring CANS, NorMuon, Low-Rank Orthogonalization, and more.
AdaMuon_adv Combines Muon's geometry with Adam-like adaptive scaling and sign-based orthogonalization.
  • Prodigy Speedup: Prodigy variants are now 50% faster by eliminating unnecessary CUDA syncs (Shoutout to @dxqb!).
  • Stochastic Rounding for BF16: Parameter updates and weight decay now accumulate in float32 and round once at the end.
  • Cautious Weight Decay: Implemented for all advanced optimizers (Paper).
  • Fused Operations: Transitioned to fused and in-place operations wherever possible.

💡 Core Innovations

(Documentation expanding on the theory and usage of these features is coming soon!)

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

adv_optm-2.5.7.tar.gz (61.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

adv_optm-2.5.7-py3-none-any.whl (85.7 kB view details)

Uploaded Python 3

File details

Details for the file adv_optm-2.5.7.tar.gz.

File metadata

  • Download URL: adv_optm-2.5.7.tar.gz
  • Upload date:
  • Size: 61.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for adv_optm-2.5.7.tar.gz
Algorithm Hash digest
SHA256 9724c4d716dd02e2b0655c1dc1aba3dc9820f94535a9a3b200d83df716613a5d
MD5 bfab2a6257e84ff9656a89fb14495199
BLAKE2b-256 22ad753bb31a2d7c68cc00c5197657afa7bee6375d1411b67cbbd2eb106ffd23

See more details on using hashes here.

File details

Details for the file adv_optm-2.5.7-py3-none-any.whl.

File metadata

  • Download URL: adv_optm-2.5.7-py3-none-any.whl
  • Upload date:
  • Size: 85.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for adv_optm-2.5.7-py3-none-any.whl
Algorithm Hash digest
SHA256 86c9d7f719448183c7e85172839b59beab2fa0289cc8b689e7eaf47f81f84167
MD5 8e333b3adeb79a63b848b5823c36b377
BLAKE2b-256 49c639cdbab6805c93ea938096ba4a845e31fff3f12e2b4322114c365a95c9db

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page