Adam-mini Optimizer
Project description
Adam-mini
This is the official PyTorch implementation of Adam-mini, a mini-version of Adam that achieves on-par or better performance than AdamW with 45% to 50% less memory footprint.
Paper: Adam-mini: Use Fewer Learning Rates To Gain More.
Github repo: https://github.com/zyushun/Adam-mini
How to use
Install torch (>=1.8.0) and run the following commands.
pip install adam-mini
or if you prefer to import from source
git clone https://github.com/zyushun/Adam-mini
cd Adam-mini
pip install -e .
Then use Adam-mini optimizer as follows.
from adam_mini import Adam_mini
optimizer = Adam_mini(
named_parameters = model.named_parameters(),
lr = lr,
betas = (beta1,beta2),
eps = eps,
weight_decay = weight_decay,
dim = model_config.dim,
n_heads = model_config.n_heads,
n_kv_heads = model_config.n_kv_heads,
)
Hyperparameter choices: Regarding learning rate (lr), weight_decay, beta1, beta2, eps, we recommend using the same values as those used for AdamW.
If you are training Transformers, please also pass the following info to Adam-mini:
-
dim: dimension for hidden feature. Could be unspecified if you are training non-transformer models.
-
n_heads: number of attention heads. Could be unspecified if you are training non-transformer models.
-
n_kv_heads: number of head for Key and Value. Or equivalently, number of query groups in Group query Attention. Also known as "n_query_groups". If is None, it will be the same value as n_head. Could be unspecified if you are training non-transformer models.
Citation
If you find this code helpful, please cite our paper in the following format.
@article{zhang2024adam,
title = {Adam-mini: Use Fewer Learning Rates To Gain More},
author = {Zhang, Yushun and Chen, Congliang and Li, Ziniu and Ding, Tian and Wu, Chenwei and Ye, Yinyu and Luo, Zhi-Quan and Sun, Ruoyu},
booktitle = {arXiv preprint arXiv:2406.16793},
year = {2024},
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file adam_mini-1.1.0.tar.gz
.
File metadata
- Download URL: adam_mini-1.1.0.tar.gz
- Upload date:
- Size: 12.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 67d1e287234cd232a470a730b4c69a50d5d8a40423b983f85d03dca3d2906283 |
|
MD5 | 1b791759c32549c7adc0500f79af6ac0 |
|
BLAKE2b-256 | d29aa9bd15ca39489ead341f2b4ed28f4956782e7d8b8a82259aa2d61da8ef9a |
File details
Details for the file adam_mini-1.1.0-py3-none-any.whl
.
File metadata
- Download URL: adam_mini-1.1.0-py3-none-any.whl
- Upload date:
- Size: 13.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f46c9cea14fbb995945bc0cb07510dc5f9a3351266b53398b0e82ea1bcb2e75b |
|
MD5 | f44d20410bd9deeaa0e8f0710ddb2703 |
|
BLAKE2b-256 | 5ad63b6194c4571c6a4a4dfd1a8957677821ad17376bf24f43e6bc9d7ad3900f |