Deep Learning optimizers developed in the Distributed Algorithms and Systems group (DASLab) @ Institute of Science and Technology Austria (ISTA)
Project description
ISTA DAS Lab Optimization Algorithms Package
This repository contains optimization algorithms for Deep Learning developed by the Distributed Algorithms and Systems lab at Institute of Science and Technology Austria.
The repository contains code for the following optimizers published by DASLab @ ISTA:
- AC/DC:
- paper: AC/DC: Alternating Compressed/DeCompressed Training of Deep Neural Networks
- official repository: GitHub
- M-FAC:
- paper: M-FAC: Efficient Matrix-Free Approximations of Second-Order Information
- official repository: GitHub
- Sparse M-FAC with Error Feedback:
- paper: Error Feedback Can Accurately Compress Preconditioners
- official repository: GitHub
- MicroAdam:
- paper: MicroAdam: Accurate Adaptive Optimization with Low Space Overhead and Provable Convergence
- official repository: GitHub
- Trion / DCT-AdamW:
- DASH:
CUDA Kernels
Please visit the repository ISTA-DASLab-Optimizers-CUDA containing the CUDA support for M-FAC, Sparse M-FAC and MicroAdam optimizers.
Installation
To use the latest stable version of this repository, you can install via pip:
pip3 install ista-daslab-optimizers
and you can also visit the PyPi page.
We also provide a script install.sh that creates a new environment, installs requirements
and then installs the project as a Python package following these steps:
git clone git@github.com:IST-DASLab/ISTA-DASLab-Optimizers.git
cd ISTA-DASLab-Optimizers
source install.sh
How to use optimizers?
In this repository we provide a minimal working example for CIFAR-10 for optimizers acdc,
dense_mfac, sparse_mfac and micro_adam:
cd examples/cifar10
OPTIMIZER=micro_adam # or any other optimizer listed above
bash run_${OPTIMIZER}.sh
To integrate the optimizers into your own pipeline, you can use the following snippets:
MicroAdam optimizer
from ista_daslab_optimizers import MicroAdam
model = MyCustomModel()
optimizer = MicroAdam(
model.parameters(), # or some custom parameter groups
m=10, # sliding window size (number of gradients)
lr=1e-5, # change accordingly
quant_block_size=100_000, # 32 or 64 also works
k_init=0.01, # float between 0 and 1 meaning percentage: 0.01 means 1%
alpha=0, # 0 means sparse update and 0 < alpha < 1 means we integrate fraction alpha from EF to update and then delete it
)
# from now on, you can use the variable `optimizer` as any other PyTorch optimizer
Versions summary:
- 1.1.12 @ February 15th, 2026:
- refactory for DASH: separated entities to different files and implemented DashGpu, as well as
a triton kernel to compute
L_t = beta * L_t-1 + (1-beta) * G @ G.TandR_t = beta * R_t-1 + (1-beta) * G.T @ Gin-place using the stacked blocks.
- refactory for DASH: separated entities to different files and implemented DashGpu, as well as
a triton kernel to compute
- 1.1.11 @ February 6th, 2026:
- added
tritonas dependency
- added
- 1.1.10 @ February 6th, 2026:
- removed fast-hadamard-transform because 1) it is not used and 2) it raises compilation errors during
pip install
- removed fast-hadamard-transform because 1) it is not used and 2) it raises compilation errors during
- 1.1.9 @ February 6th, 2026:
- added DASH optimizer
- 1.1.8 @ February 5th, 2026:
- moved kernels to ISTA-DASLab-Optimizers-CUDA
- building building the package after adding a new optimizer that doesn't require CUDA support would require compiling the kernels from scratch, which is time consuming and not needed
- 1.1.7 @ October 8th, 2025:
- added code for
Trion & DCT-AdamW
- added code for
- 1.1.6 @ February 19th, 2025:
- do not update the parameters that have
Nonegradient in methodupdate_modelfromtools.py. This is useful when using M-FAC for models with more than one classification head in the Continual Learning framework.
- do not update the parameters that have
- 1.1.5 @ February 19th, 2025:
- adapted
DenseMFACfor a model with multiple classification heads for Continual Learning where we have one feature extractor block and a list of classification heads. The issue was related to the model size, which included the feature extractor backbone and all classification heads, but in practice only one classification head will be used for training and inference. This caused some size mismatch errors at runtime in theDenseCoreMFACmodule because the gradient at runtime had fewer entries than the entire model. When usingDenseMFACfor such settings, setoptimizer.model_sizeto the correct size after calling the constructor and theDenseCoreMFACobject will be created automatically in thestepfunction.
- adapted
- 1.1.3 @ September 5th, 2024:
- allow using
SparseCoreMFACwithEFseparately by importing it insparse_mfac.__init__.py
- allow using
- 1.1.2 @ August 1st, 2024:
- [1.1.0]: added support to densify the final update: introduced parameter alpha that controls
the fraction of error feedback (EF) to be integrated into the update to make it dense. Finally, the
fraction alpha will be discarded from the EF at the expense of another call to
QinvandQ(and implicitly quantization statistics computation). - [1.0.2]: added FSDP-compatible implementation by initializing the parameter states in the
update_stepmethod instead of MicroAdam constructor
- [1.1.0]: added support to densify the final update: introduced parameter alpha that controls
the fraction of error feedback (EF) to be integrated into the update to make it dense. Finally, the
fraction alpha will be discarded from the EF at the expense of another call to
- 1.0.1 @ June 27th, 2024:
- removed version in dependencies to avoid conflicts with llm-foundry
- 1.0.0 @ June 20th, 2024:
- changed minimum required Python version to 3.8+ and torch to 2.3.0+
- 0.0.1 @ June 13th, 2024:
- added initial version of the package for Python 3.9+ and torch 2.3.1+
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ista_daslab_optimizers-1.1.12.tar.gz.
File metadata
- Download URL: ista_daslab_optimizers-1.1.12.tar.gz
- Upload date:
- Size: 75.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.23
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1778a875e44deaf001dda973a511d23a71de556e9507cf49a4e71e62e8394a6b
|
|
| MD5 |
0240e2600159f059d0779748ec32b1c4
|
|
| BLAKE2b-256 |
1492ebeef9c987570c9d258751579449277bbf2e29d3c4a0255be1b9e9b40262
|
File details
Details for the file ista_daslab_optimizers-1.1.12-py3-none-any.whl.
File metadata
- Download URL: ista_daslab_optimizers-1.1.12-py3-none-any.whl
- Upload date:
- Size: 99.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.23
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c1a8e8da79cc93937012b8153550d64f592be15bee5f79ce99a789f6db0c84f0
|
|
| MD5 |
5508d5b303ffb8273cb7878d2d3a5d6f
|
|
| BLAKE2b-256 |
2fb8f96a0af7444d5ad4b7ba7e39c09366bcd11f398152df69df3aa960824635
|