Skip to main content

CAME Optimizer - Pytorch Version

Project description

CAME Optimizer

ACL 2023 Outstanding Paper Award
Confidence-guided Adaptive Memory Efficient Optimization

This is an official implementation of CAME optimizer in the "Confidence-guided Adaptive Memory Efficient Optimization". Please cite the paper and star this repo if you find CAME useful. Thanks!

Paper | Twitter | Blog | Pypi Package | zhihu

Method

In this work, we studied a confidence-guided strategy to reduce the instability of existing memory efficient optimizers. Based on this strategy, we proposed CAME to simultaneously achieve two goals: fast convergence as in traditional adaptive methods, and low memory usage as in memory-efficient methods.

The pseudo code is presented in the figure with difference with Adafactor in blue fonts.

CAME optimizer pseudo code

Install

pip install came-pytorch

Usage

from came_pytorch import CAME
optimizer = CAME(
    model.parameters(),
    lr=2e-4,
    weight_decay=1e-2,
    betas=(0.9, 0.999, 0.9999),
    eps=(1e-30, 1e-16)
)

Hyper-parameter Tuning

  • Pre-training: Based on our experiments on BERT-Large, GPT-2 and T5, it's suitable to choose a learning rate for CAME 3-1x smaller than that for AdamW.
  • Consider choosing $\beta_3$ between $[0.9995, 0.99995]$ if setting $\beta_1, \beta_2=0.9, 0.999$. Due to computational resource constraints, we did not explore more combinations of three betas. Different training tasks may require different combinations of optimal performance.
  • If you have any feedback or comments regarding hyper-parameter tuning, please do not hesitate to provide them to us!

Experiments

Apart from the BERT and T5 experiments shown in the paper, we conduct more and record the results here.

Fine-tuning LLaMA-7B

MMLU WikiText HellaSwag TruthfulQA (MC) BoolQ COPA WSC WIC
Alpaca-7B 40.21 6.74 59.76 38.89 79.57 88.00 46.15 49.84
Alpaca-7B-CAME 40.59 6.38 59.80 38.61 79.08 88.00 49.04 50.78

We fine-tuned LLaMA-7B with stanford-alpaca (52k instruction-tuning dataset). To replicate our result, first register the CAME optimizer to the transformer package. Then in Alpaca training script, change the default optimizer from "adamw" to "came".

Alpaca-7B and Alpaca-7B-CAME are evaluated using Instruct-eval and lm-evaluation-harness.

Pre-training GPT-2

CAME_gpt2

The pre-training of GPT-2 (Medium, 345M) is based on Megatron-LM. To replicate our result, add the CAME optimizer in megatron/optimizer/__init__.py and set the args.optimizer to "came".

Memory Usage Comparison

To ensure a fair comparison, we set the batch size to 1 for the pre-training of GPT-2 (Medium) to examine the memory footprint of CAME and AdamW.

AdamW CAME
Memory (GiB) 8.77 7.44

Citation

@inproceedings{luo2023came,
  title={CAME: Confidence-guided Adaptive Memory Efficient Optimization},
  author={Luo, Yang and Ren, Xiaozhe and Zheng, Zangwei and Jiang, Zhuo and Jiang, Xin and You, Yang},
  booktitle={Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
  pages={4442--4453},
  year={2023}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

came-pytorch-0.1.3.tar.gz (5.8 kB view details)

Uploaded Source

Built Distribution

came_pytorch-0.1.3-py3-none-any.whl (6.2 kB view details)

Uploaded Python 3

File details

Details for the file came-pytorch-0.1.3.tar.gz.

File metadata

  • Download URL: came-pytorch-0.1.3.tar.gz
  • Upload date:
  • Size: 5.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.16

File hashes

Hashes for came-pytorch-0.1.3.tar.gz
Algorithm Hash digest
SHA256 d15cd5ae58f4df79b88e06cf78e4bbd0c31ab115df126475d7986a165c22836d
MD5 d6b22cdfda8dfb0e516f25c80bc0b95b
BLAKE2b-256 28c0a33e490afcefab3fa0788a40b3cdcaa8e651cbc90731871925d15be02e7d

See more details on using hashes here.

File details

Details for the file came_pytorch-0.1.3-py3-none-any.whl.

File metadata

File hashes

Hashes for came_pytorch-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 576b30dd168cb688102806a4a1c8eb215fc2958b76c2e8a349a9e9c0a1c42f0a
MD5 f2847bd24a2b5e39ed5f288e4b2e4d9a
BLAKE2b-256 b53fee0265e886eb3052419e0110f12cbaa0471f042d0b702096fc4b51d63957

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page