A family of highly efficient, lightweight yet powerful optimizers.
Project description
Advanced Optimizers (AIO)
A comprehensive, all-in-one collection of state-of-the-art optimization algorithms for deep learning. Designed for maximum efficiency, minimal memory footprint, and superior performance across diverse model architectures and training scenarios.
📦 Installation
pip install adv_optm
Requires PyTorch 2.3+ for torch.compile support.
What's New
🌟 Version 2.5.x: The Massive Refactor
This major update introduces a complete architectural refactor of the library:
🆕 New Optimizers & Scaling
SinkSGD_adv: Added a powerful new optimizer to the lineup.- Spectral Scaling: Now available across all optimizers, achieving width/rank invariant updates for highly stable training.
💾 Memory & State Precision Control
- Granular State Precision (
state_precision): Drastically reduce memory overhead with new optimizer state modes:factored(Rank-2 factored mode)fp32(Full precision)bf16_sr&int8_sr(BF16/Int8 with Stochastic Rounding)
- Factored Second Moment (
factored_2nd): Available for all Adam variants. Works seamlessly alongside anystate_precisionsetting to further slash memory usage.
⚙️ Advanced Dynamics & Momentum
- Variance Normalized Momentum (
normed_momentum): Applies optimizer normalization before momentum (Normalization then Momentum/NtM). Available forAdamW_adv,SignSGD_adv, andSinkSGD_adv. - Universal Nesterov Momentum: Replaced the hard-to-tune Simplified_AdEMAMix with Nesterov momentum (
nesterov) and a dedicated coefficient (nesterov_coef) across all optimizers. - Preconditioning & Signs:
- Improved CANS (
accelerated_ns): Enhanced for Muon variants by integrating a dynamic lower bound. - New OrthoGrad modes (
orthogonal_gradient): Standard OrthoGradflattenedand a new matrix-wise modeiterative.
⚓ Weight Decay Innovations
- Centered Weight Decay (
centered_wd): Pulls weights toward their pre-train state (anchor). To save memory, anchor precision (centered_wd_mode) can be set to full, float8, int8, or int4. - Fisher Weight Decay (
fisher_wd): Now available for Adam variants based on the FAdam paper. - Geometric Weight Decay: Added specifically for
SinkSGD_advandSignSGD_adv.
(Note: Lion_Prodigy_adv, Simplified_AdEMAMix, and heuristic cautious/grams modes have been deprecated in favor of these superior, theoretically-grounded features).
Click to see older release notes (v1.2.x - v2.1.x)
Version 2.1.x
- New Optimizer: Added Signum (SignSGD with momentum) to the
SignSGD_advfamily.
Version 2.0.x
- ⚡
torch.compileSupport: Fully implemented for all advanced optimizers. Enable viacompiled_optimizer=Trueto heavily fuse and optimize the optimizer step path. - 📉 1-Bit Factored Mode: Vastly improved implementation via
nnmf_factor=True. - 🛠️ Broad performance and stability improvements across all optimizers.
Version 1.2.x
- Advanced Muon Variants: Brought the groundbreaking Muon optimizer into the fold, enriched with features from recent literature.
| Optimizer | Description |
|---|---|
Muon_adv |
Advanced Muon implementation featuring CANS, NorMuon, Low-Rank Orthogonalization, and more. |
AdaMuon_adv |
Combines Muon's geometry with Adam-like adaptive scaling and sign-based orthogonalization. |
- Prodigy Speedup: Prodigy variants are now 50% faster by eliminating unnecessary CUDA syncs (Shoutout to @dxqb!).
- Stochastic Rounding for BF16: Parameter updates and weight decay now accumulate in float32 and round once at the end.
- Cautious Weight Decay: Implemented for all advanced optimizers (Paper).
- Fused Operations: Transitioned to fused and in-place operations wherever possible.
💡 Core Innovations
(Documentation expanding on the theory and usage of these features is coming soon!)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file adv_optm-2.6.dev1.tar.gz.
File metadata
- Download URL: adv_optm-2.6.dev1.tar.gz
- Upload date:
- Size: 62.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
afba0b1e31ca5823b974ef6a625ef3a39013c54abb4cdd723f0e8f0ecf0ac51c
|
|
| MD5 |
208d0d29261a2127d8ba106b5c4a1e7c
|
|
| BLAKE2b-256 |
eeef186a2638ce76f7056caa821f0d50bfa447fcda19679acb2a1a41a46684ce
|
File details
Details for the file adv_optm-2.6.dev1-py3-none-any.whl.
File metadata
- Download URL: adv_optm-2.6.dev1-py3-none-any.whl
- Upload date:
- Size: 87.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4b1c1842e5aaf990b96c1e164fd9e5fb50abb0630f1a0b99bc95575c2d415549
|
|
| MD5 |
073f841418aa9b7a802c160c08d3e66f
|
|
| BLAKE2b-256 |
fb318f25c48e6dea247ea51b0ed054717ae3fb0f6da5f04cc854de559fd997fa
|