Skip to main content

Adam-mini Optimizer

Project description

Adam-mini

This is the official PyTorch implementation of Adam-mini, a mini-version of Adam that achieves on-par or better performance than AdamW with 45% to 50% less memory footprint.

Paper: Adam-mini: Use Fewer Learning Rates To Gain More.

Github repo: https://github.com/zyushun/Adam-mini

How to use

Install torch (>=1.8.0) and run the following commands.

pip install adam-mini

or if you prefer to import from source

git clone https://github.com/zyushun/Adam-mini
cd Adam-mini
pip install -e .

Then use Adam-mini optimizer as follows.

from adam_mini import Adam_mini

optimizer = Adam_mini(
            named_parameters = model.named_parameters(), 
            lr = lr, 
            betas = (beta1,beta2), 
            eps = eps,
            weight_decay = weight_decay, 
            dim = model_config.dim,
            n_heads = model_config.n_heads,
            n_kv_heads = model_config.n_kv_heads,
            )        

Hyperparameter choices: Regarding learning rate (lr), weight_decay, beta1, beta2, eps, we recommend using the same values as those used for AdamW.

If you are training Transformers, please also pass the following info to Adam-mini:

  • dim: dimension for hidden feature. Could be unspecified if you are training non-transformer models.

  • n_heads: number of attention heads. Could be unspecified if you are training non-transformer models.

  • n_kv_heads: number of head for Key and Value. Or equivalently, number of query groups in Group query Attention. Also known as "n_query_groups". If is None, it will be the same value as n_head. Could be unspecified if you are training non-transformer models.

Citation

If you find this code helpful, please cite our paper in the following format.

@article{zhang2024adam,
  title     = {Adam-mini: Use Fewer Learning Rates To Gain More},
  author    = {Zhang, Yushun and Chen, Congliang  and Li, Ziniu and Ding, Tian and Wu, Chenwei and Ye, Yinyu and Luo, Zhi-Quan and Sun, Ruoyu},
  booktitle = {arXiv preprint arXiv:2406.16793},
  year      = {2024},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

adam_mini-1.1.1.tar.gz (12.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

adam_mini-1.1.1-py3-none-any.whl (13.0 kB view details)

Uploaded Python 3

File details

Details for the file adam_mini-1.1.1.tar.gz.

File metadata

  • Download URL: adam_mini-1.1.1.tar.gz
  • Upload date:
  • Size: 12.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.12

File hashes

Hashes for adam_mini-1.1.1.tar.gz
Algorithm Hash digest
SHA256 1fd0977f34d2fb44f0f01703085d53e856bdf4ac75a52a96ec3a1dd9311683d0
MD5 964f29ece817c8e247d90facb18edbbf
BLAKE2b-256 c623767414bf2d4b771367390beb48d252912a63442ed6adfb738ad4cf3d187a

See more details on using hashes here.

File details

Details for the file adam_mini-1.1.1-py3-none-any.whl.

File metadata

  • Download URL: adam_mini-1.1.1-py3-none-any.whl
  • Upload date:
  • Size: 13.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.12

File hashes

Hashes for adam_mini-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 7af0cca7ecbe445cbd5d358043899c9e3e8ca29078108673e8c6b3578a136da1
MD5 49370fb2d8f33b6a0e596d084d5e7624
BLAKE2b-256 8461ae35be55e7706682f6699bfae56373cbb362aba7552f863b055a65fec186

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page