Skip to main content

Adam-mini Optimizer

Project description

Adam-mini

This is the official PyTorch implementation of Adam-mini, a mini-version of Adam that achieves on-par or better performance than AdamW with 45% to 50% less memory footprint.

Paper: Adam-mini: Use Fewer Learning Rates To Gain More.

Github repo: https://github.com/zyushun/Adam-mini

How to use

Install torch (>=1.8.0) and run the following commands.

pip install adam-mini

or if you prefer to import from source

git clone https://github.com/zyushun/Adam-mini
cd Adam-mini
pip install -e .

Then use Adam-mini optimizer as follows.

from adam_mini import Adam_mini

optimizer = Adam_mini(
            named_parameters = model.named_parameters(), 
            lr = lr, 
            betas = (beta1,beta2), 
            eps = eps,
            weight_decay = weight_decay, 
            dim = model_config.dim,
            n_heads = model_config.n_heads,
            n_kv_heads = model_config.n_kv_heads,
            )        

Hyperparameter choices: Regarding learning rate (lr), weight_decay, beta1, beta2, eps, we recommend using the same values as those used for AdamW.

If you are training Transformers, please also pass the following info to Adam-mini:

  • dim: dimension for hidden feature. Could be unspecified if you are training non-transformer models.

  • n_heads: number of attention heads. Could be unspecified if you are training non-transformer models.

  • n_kv_heads: number of head for Key and Value. Or equivalently, number of query groups in Group query Attention. Also known as "n_query_groups". If is None, it will be the same value as n_head. Could be unspecified if you are training non-transformer models.

Citation

If you find this code helpful, please cite our paper in the following format.

@article{zhang2024adam,
  title     = {Adam-mini: Use Fewer Learning Rates To Gain More},
  author    = {Zhang, Yushun and Chen, Congliang  and Li, Ziniu and Ding, Tian and Wu, Chenwei and Ye, Yinyu and Luo, Zhi-Quan and Sun, Ruoyu},
  booktitle = {arXiv preprint arXiv:2406.16793},
  year      = {2024},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

adam_mini-1.1.0.tar.gz (12.4 kB view details)

Uploaded Source

Built Distribution

adam_mini-1.1.0-py3-none-any.whl (13.1 kB view details)

Uploaded Python 3

File details

Details for the file adam_mini-1.1.0.tar.gz.

File metadata

  • Download URL: adam_mini-1.1.0.tar.gz
  • Upload date:
  • Size: 12.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.12

File hashes

Hashes for adam_mini-1.1.0.tar.gz
Algorithm Hash digest
SHA256 67d1e287234cd232a470a730b4c69a50d5d8a40423b983f85d03dca3d2906283
MD5 1b791759c32549c7adc0500f79af6ac0
BLAKE2b-256 d29aa9bd15ca39489ead341f2b4ed28f4956782e7d8b8a82259aa2d61da8ef9a

See more details on using hashes here.

File details

Details for the file adam_mini-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: adam_mini-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 13.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.12

File hashes

Hashes for adam_mini-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f46c9cea14fbb995945bc0cb07510dc5f9a3351266b53398b0e82ea1bcb2e75b
MD5 f44d20410bd9deeaa0e8f0710ddb2703
BLAKE2b-256 5ad63b6194c4571c6a4a4dfd1a8957677821ad17376bf24f43e6bc9d7ad3900f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page