CAME Optimizer - Pytorch Version
Project description
CAME Optimizer
ACL 2023 Outstanding Paper Award
Confidence-guided Adaptive Memory Efficient Optimization
This is an official implementation of CAME optimizer in the "Confidence-guided Adaptive Memory Efficient Optimization". Please cite the paper and star this repo if you find CAME useful. Thanks!
Paper | Twitter | Blog | Pypi Package | zhihu
Method
In this work, we studied a confidence-guided strategy to reduce the instability of existing memory efficient optimizers. Based on this strategy, we proposed CAME to simultaneously achieve two goals: fast convergence as in traditional adaptive methods, and low memory usage as in memory-efficient methods.
The pseudo code is presented in the figure with difference with Adafactor in blue fonts.
Install
pip install came-pytorch
Usage
from came_pytorch import CAME
optimizer = CAME(
model.parameters(),
lr=2e-4,
weight_decay=1e-2,
betas=(0.9, 0.999, 0.9999),
eps=(1e-30, 1e-16)
)
Hyper-parameter Tuning
- Pre-training: Based on our experiments on BERT-Large, GPT-2 and T5, it's suitable to choose a learning rate for CAME 3-1x smaller than that for AdamW.
- Consider choosing $\beta_3$ between $[0.9995, 0.99995]$ if setting $\beta_1, \beta_2=0.9, 0.999$. Due to computational resource constraints, we did not explore more combinations of three betas. Different training tasks may require different combinations of optimal performance.
- If you have any feedback or comments regarding hyper-parameter tuning, please do not hesitate to provide them to us!
Experiments
Apart from the BERT and T5 experiments shown in the paper, we conduct more and record the results here.
Fine-tuning LLaMA-7B
MMLU | WikiText | HellaSwag | TruthfulQA (MC) | BoolQ | COPA | WSC | WIC | |
---|---|---|---|---|---|---|---|---|
Alpaca-7B | 40.21 | 6.74 | 59.76 | 38.89 | 79.57 | 88.00 | 46.15 | 49.84 |
Alpaca-7B-CAME | 40.59 | 6.38 | 59.80 | 38.61 | 79.08 | 88.00 | 49.04 | 50.78 |
We fine-tuned LLaMA-7B with stanford-alpaca (52k instruction-tuning dataset). To replicate our result, first register the CAME optimizer to the transformer package. Then in Alpaca training script, change the default optimizer from "adamw" to "came".
Alpaca-7B and Alpaca-7B-CAME are evaluated using Instruct-eval and lm-evaluation-harness.
Pre-training GPT-2
The pre-training of GPT-2 (Medium, 345M) is based on Megatron-LM. To replicate our result, add the CAME optimizer in megatron/optimizer/__init__.py
and set the args.optimizer to "came".
Memory Usage Comparison
To ensure a fair comparison, we set the batch size to 1 for the pre-training of GPT-2 (Medium) to examine the memory footprint of CAME and AdamW.
AdamW | CAME | |
---|---|---|
Memory (GiB) | 8.77 | 7.44 |
Citation
@inproceedings{luo2023came,
title={CAME: Confidence-guided Adaptive Memory Efficient Optimization},
author={Luo, Yang and Ren, Xiaozhe and Zheng, Zangwei and Jiang, Zhuo and Jiang, Xin and You, Yang},
booktitle={Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
pages={4442--4453},
year={2023}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file came-pytorch-0.1.3.tar.gz
.
File metadata
- Download URL: came-pytorch-0.1.3.tar.gz
- Upload date:
- Size: 5.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d15cd5ae58f4df79b88e06cf78e4bbd0c31ab115df126475d7986a165c22836d |
|
MD5 | d6b22cdfda8dfb0e516f25c80bc0b95b |
|
BLAKE2b-256 | 28c0a33e490afcefab3fa0788a40b3cdcaa8e651cbc90731871925d15be02e7d |
File details
Details for the file came_pytorch-0.1.3-py3-none-any.whl
.
File metadata
- Download URL: came_pytorch-0.1.3-py3-none-any.whl
- Upload date:
- Size: 6.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 576b30dd168cb688102806a4a1c8eb215fc2958b76c2e8a349a9e9c0a1c42f0a |
|
MD5 | f2847bd24a2b5e39ed5f288e4b2e4d9a |
|
BLAKE2b-256 | b53fee0265e886eb3052419e0110f12cbaa0471f042d0b702096fc4b51d63957 |