Skip to main content

A family of highly efficient, lightweight yet powerful optimizers.

Project description

Advanced Optimizers (AIO)

A comprehensive, all-in-one collection of state-of-the-art optimization algorithms for deep learning. Designed for maximum efficiency, minimal memory footprint, and superior performance across diverse model architectures and training scenarios.

PyPI version Python versions License


📦 Installation

pip install adv_optm

Requires PyTorch 2.3+ for torch.compile support.


What's New

🌟 Version 2.5.x: The Massive Refactor

This major update introduces a complete architectural refactor of the library:

🆕 New Optimizers & Scaling

  • SinkSGD_adv: Added a powerful new optimizer to the lineup.
  • Spectral Scaling: Now available across all optimizers, achieving width/rank invariant updates for highly stable training.

💾 Memory & State Precision Control

  • Granular State Precision (state_precision): Drastically reduce memory overhead with new optimizer state modes:
    • factored (Rank-2 factored mode)
    • fp32 (Full precision)
    • bf16_sr & int8_sr (BF16/Int8 with Stochastic Rounding)
  • Factored Second Moment (factored_2nd): Available for all Adam variants. Works seamlessly alongside any state_precision setting to further slash memory usage.

⚙️ Advanced Dynamics & Momentum

  • Variance Normalized Momentum (normed_momentum): Applies optimizer normalization before momentum (Normalization then Momentum/NtM). Available for AdamW_adv, SignSGD_adv, and SinkSGD_adv.
  • Universal Nesterov Momentum: Replaced the hard-to-tune Simplified_AdEMAMix with Nesterov momentum (nesterov) and a dedicated coefficient (nesterov_coef) across all optimizers.
  • Preconditioning & Signs:
    • Added Variance/Confidence Preconditioning (snr_cond) for SignSGD_adv and SinkSGD_adv (requires normed_momentum). Read the technical reports: AASS & sink-v.
    • Added Adaptive Stochastic Sign with $L_\infty$ preconditioning (stochastic_sign) for SignSGD_Adv and Lion_adv.
  • Improved CANS (accelerated_ns): Enhanced for Muon variants by integrating a dynamic lower bound.
  • New OrthoGrad modes (orthogonal_gradient): Standard OrthoGrad flattened and a new matrix-wise mode iterative.

⚓ Weight Decay Innovations

  • Centered Weight Decay (centered_wd): Pulls weights toward their pre-train state (anchor). To save memory, anchor precision (centered_wd_mode) can be set to full, float8, int8, or int4.
  • Fisher Weight Decay (fisher_wd): Now available for Adam variants based on the FAdam paper.
  • Geometric Weight Decay: Added specifically for SinkSGD_adv and SignSGD_adv.

(Note: Lion_Prodigy_adv, Simplified_AdEMAMix, and heuristic cautious/grams modes have been deprecated in favor of these superior, theoretically-grounded features).

Click to see older release notes (v1.2.x - v2.1.x)

Version 2.1.x

  • New Optimizer: Added Signum (SignSGD with momentum) to the SignSGD_adv family.

Version 2.0.x

  • torch.compile Support: Fully implemented for all advanced optimizers. Enable via compiled_optimizer=True to heavily fuse and optimize the optimizer step path.
  • 📉 1-Bit Factored Mode: Vastly improved implementation via nnmf_factor=True.
  • 🛠️ Broad performance and stability improvements across all optimizers.

Version 1.2.x

  • Advanced Muon Variants: Brought the groundbreaking Muon optimizer into the fold, enriched with features from recent literature.
Optimizer Description
Muon_adv Advanced Muon implementation featuring CANS, NorMuon, Low-Rank Orthogonalization, and more.
AdaMuon_adv Combines Muon's geometry with Adam-like adaptive scaling and sign-based orthogonalization.
  • Prodigy Speedup: Prodigy variants are now 50% faster by eliminating unnecessary CUDA syncs (Shoutout to @dxqb!).
  • Stochastic Rounding for BF16: Parameter updates and weight decay now accumulate in float32 and round once at the end.
  • Cautious Weight Decay: Implemented for all advanced optimizers (Paper).
  • Fused Operations: Transitioned to fused and in-place operations wherever possible.

💡 Core Innovations

(Documentation expanding on the theory and usage of these features is coming soon!)

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

adv_optm-2.6.1.dev2.tar.gz (61.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

adv_optm-2.6.1.dev2-py3-none-any.whl (86.0 kB view details)

Uploaded Python 3

File details

Details for the file adv_optm-2.6.1.dev2.tar.gz.

File metadata

  • Download URL: adv_optm-2.6.1.dev2.tar.gz
  • Upload date:
  • Size: 61.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for adv_optm-2.6.1.dev2.tar.gz
Algorithm Hash digest
SHA256 3a5af38ed8d08303b28803c0df8763db7a040f0a5f403102d0e361be05e670bd
MD5 018f05641debc26ebcef59a5f1f17000
BLAKE2b-256 384ec8262602d3b71dae9391e1b8f8e09a9db055dc4100b265292d356acef2a5

See more details on using hashes here.

File details

Details for the file adv_optm-2.6.1.dev2-py3-none-any.whl.

File metadata

  • Download URL: adv_optm-2.6.1.dev2-py3-none-any.whl
  • Upload date:
  • Size: 86.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for adv_optm-2.6.1.dev2-py3-none-any.whl
Algorithm Hash digest
SHA256 3e42088599bce6c0d3223f7af8fe602cad7773aa3f35c0a01107719739c4211a
MD5 67ce52f69997233f47871b4d7d7d9fe9
BLAKE2b-256 e5d0cc08e0b860a54b842911b3c48fcf8dd63441eaa290423c545df5f6f4c3ac

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page