Skip to main content

Deep Learning optimizers developed in the Distributed Algorithms and Systems group (DASLab) @ Institute of Science and Technology Austria (ISTA)

Project description

ISTA DAS Lab Optimization Algorithms Package

This repository contains optimization algorithms for Deep Learning developed by the Distributed Algorithms and Systems lab at Institute of Science and Technology Austria.

The repository contains code for the following optimizers published by DASLab @ ISTA:

Installation

To use the latest stable version of this repository, you can install via pip:

pip3 install ista-daslab-optimizers

and you can also visit the PyPi page.

We also provide a script install.sh that creates a new environment, installs requirements and then installs the project as a Python package following these steps:

git clone git@github.com:IST-DASLab/ISTA-DASLab-Optimizers.git
cd ISTA-DASLab-Optimizers
source install.sh

How to use optimizers?

In this repository we provide a minimal working example for CIFAR-10 for optimizers acdc, dense_mfac, sparse_mfac and micro_adam:

cd examples/cifar10
OPTIMIZER=micro_adam # or any other optimizer listed above
bash run_${OPTIMIZER}.sh

To integrate the optimizers into your own pipeline, you can use the following snippets:

MicroAdam optimizer

from ista_daslab_optimizers import MicroAdam

model = MyCustomModel()

optimizer = MicroAdam(
    model.parameters(), # or some custom parameter groups
    m=10, # sliding window size (number of gradients)
    lr=1e-5, # change accordingly
    quant_block_size=100_000, # 32 or 64 also works
    k_init=0.01, # float between 0 and 1 meaning percentage: 0.01 means 1%
    alpha=0, # 0 means sparse update and 0 < alpha < 1 means we integrate fraction alpha from EF to update and then delete it
)

# from now on, you can use the variable `optimizer` as any other PyTorch optimizer

Versions summary:


  • 1.1.11 @ February 6th, 2026:
    • added triton as dependency
  • 1.1.10 @ February 6th, 2026:
    • removed fast-hadamard-transform because 1) it is not used and 2) it raises compilation errors during pip install
  • 1.1.9 @ February 6th, 2026:
    • added DASH optimizer
  • 1.1.8 @ February 5th, 2026:
    • moved kernels to ISTA-DASLab-Optimizers-CUDA
    • building building the package after adding a new optimizer that doesn't require CUDA support would require compiling the kernels from scratch, which is time consuming and not needed
  • 1.1.7 @ October 8th, 2025:
    • added code for Trion & DCT-AdamW
  • 1.1.6 @ February 19th, 2025:
    • do not update the parameters that have None gradient in method update_model from tools.py. This is useful when using M-FAC for models with more than one classification head in the Continual Learning framework.
  • 1.1.5 @ February 19th, 2025:
    • adapted DenseMFAC for a model with multiple classification heads for Continual Learning where we have one feature extractor block and a list of classification heads. The issue was related to the model size, which included the feature extractor backbone and all classification heads, but in practice only one classification head will be used for training and inference. This caused some size mismatch errors at runtime in the DenseCoreMFAC module because the gradient at runtime had fewer entries than the entire model. When using DenseMFAC for such settings, set optimizer.model_size to the correct size after calling the constructor and the DenseCoreMFAC object will be created automatically in the step function.
  • 1.1.3 @ September 5th, 2024:
    • allow using SparseCoreMFACwithEF separately by importing it in sparse_mfac.__init__.py
  • 1.1.2 @ August 1st, 2024:
    • [1.1.0]: added support to densify the final update: introduced parameter alpha that controls the fraction of error feedback (EF) to be integrated into the update to make it dense. Finally, the fraction alpha will be discarded from the EF at the expense of another call to Qinv and Q (and implicitly quantization statistics computation).
    • [1.0.2]: added FSDP-compatible implementation by initializing the parameter states in the update_step method instead of MicroAdam constructor
  • 1.0.1 @ June 27th, 2024:
    • removed version in dependencies to avoid conflicts with llm-foundry
  • 1.0.0 @ June 20th, 2024:
    • changed minimum required Python version to 3.8+ and torch to 2.3.0+
  • 0.0.1 @ June 13th, 2024:
    • added initial version of the package for Python 3.9+ and torch 2.3.1+

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ista_daslab_optimizers-1.1.11.tar.gz (58.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ista_daslab_optimizers-1.1.11-py3-none-any.whl (66.0 kB view details)

Uploaded Python 3

File details

Details for the file ista_daslab_optimizers-1.1.11.tar.gz.

File metadata

  • Download URL: ista_daslab_optimizers-1.1.11.tar.gz
  • Upload date:
  • Size: 58.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.23

File hashes

Hashes for ista_daslab_optimizers-1.1.11.tar.gz
Algorithm Hash digest
SHA256 e349d5c49975e550a706ec871aafafa4198669d1a483ac100ec8183825fe603a
MD5 40a8759a8fe3f56dcdb2f5f93bfdbcf9
BLAKE2b-256 6e4ca7a1a4daae6c0fead783656c3ab673d4a4cc99fa0108dbb2b86b3c7c3818

See more details on using hashes here.

File details

Details for the file ista_daslab_optimizers-1.1.11-py3-none-any.whl.

File metadata

File hashes

Hashes for ista_daslab_optimizers-1.1.11-py3-none-any.whl
Algorithm Hash digest
SHA256 82fd921904810c6df8296b84f3a30aa52a51beca19352dd288be272babf6ddea
MD5 661c4274b6ca52f472f01716061b4d93
BLAKE2b-256 e61f2dcc107d8bb3a69d2b00dca171b092830ed55b43a4b76a76045f4faaf7e5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page