lion-pytorch

Lion Optimizer - Pytorch

These details have not been verified by PyPI

Project links

Repository

Project description

🦁 Lion - Pytorch

🦁 Lion, EvoLved Sign Momentum, new optimizer discovered by Google Brain that is purportedly better than Adam(w), in Pytorch. This is nearly a straight copy from here, with few minor modifications.

It is so simple, we may as well get it accessible and used asap by everyone to train some great models, if it really works 🤞

Instructions

Learning rate and weight decay: the authors write in Section 5 - Based on our experience, a suitable learning rate for Lion is typically 3-10x smaller than that for AdamW. Since the effective weight decay is lr * λ, the value of decoupled weight decay λ used for Lion is 3-10x larger than that for AdamW in order to maintain a similar strength. The initial value, peak value, and end value in the learning rate schedule should be changed simultaneously with the same ratio compared to AdamW, evidenced by a researcher.
Learning rate schedule: the authors use the same learning rate schedule for Lion as AdamW in the paper. Nevertheless, they observe a larger gain when using a cosine decay schedule to train ViT, compared to a reciprocal square-root schedule.
β1 and β2: the authors write in Section 5 - The default values for β1 and β2 in AdamW are set as 0.9 and 0.999, respectively, with an ε of 1e−8, while in Lion, the default values for β1 and β2 are discovered through the program search process and set as 0.9 and 0.99, respectively. Similar to how people reduce β2 to 0.99 or smaller and increase ε to 1e-6 in AdamW to improve stability, using β1=0.95, β2=0.98 in Lion can also be helpful in mitigating instability during training, suggested by the authors. This was corroborated by a researcher.

Updates

Update: seems to work for my local enwik8 autoregressive language modeling.
Update 2: experiments, seems much worse than Adam if learning rate held constant.
Update 3: Dividing the learning rate by 3, seeing better early results than Adam. Maybe Adam has been dethroned, after nearly a decade.
Update 4: using the 10x smaller learning rate rule of thumb from the paper resulted in the worst run. So I guess it still takes a bit of tuning.

A summarization of previous updates: as shown in the experiments, Lion with a 3x smaller learning rate beats Adam. It still takes a bit of tuning as a 10x smaller learning rate leads to a worse result.

Update 5: so far hearing all positive results for language modeling, when done right. Also heard positive results for significant text-to-image training, although it takes a bit of tuning. The negative results seem to be with problems and architectures outside of what was evaluated in the paper - RL, feedforward networks, weird hybrid architectures with LSTMs + convolutions etc. Negative anecdata also confirms this technique is sensitive to batch size, amount of data / augmentation. Tbd what optimal learning rate schedule is, and whether cooldown affects results. Also interestingly have a positive result at open-clip, which became negative as the model size was scaled up (but may be resolvable).
Update 6: open clip issue resolved by the author, by setting a higher initial temperature.
Update 7: would only recommend this optimizer in the setting of high batch sizes (64 or above)

Install

$ pip install lion-pytorch

Alternatively, using conda:

$ conda install lion-pytorch

Usage

# toy model

import torch
from torch import nn

model = nn.Linear(10, 1)

# import Lion and instantiate with parameters

from lion_pytorch import Lion

opt = Lion(model.parameters(), lr=1e-4, weight_decay=1e-2)

# forward and backwards

loss = model(torch.randn(10))
loss.backward()

# optimizer step

opt.step()
opt.zero_grad()

To use a fused kernel for updating the parameters, first pip install triton -U --pre, then

opt = Lion(
    model.parameters(),
    lr=1e-4,
    weight_decay=1e-2,
    use_triton=True # set this to True to use cuda kernel w/ Triton lang (Tillet et al)
)

Appreciation

Stability.ai for the generous sponsorship to work and open source cutting edge artificial intelligence research

Citations

@misc{https://doi.org/10.48550/arxiv.2302.06675,
    url     = {https://arxiv.org/abs/2302.06675},
    author  = {Chen, Xiangning and Liang, Chen and Huang, Da and Real, Esteban and Wang, Kaiyuan and Liu, Yao and Pham, Hieu and Dong, Xuanyi and Luong, Thang and Hsieh, Cho-Jui and Lu, Yifeng and Le, Quoc V.},
    title   = {Symbolic Discovery of Optimization Algorithms},
    publisher = {arXiv},
    year = {2023}
}

@article{Tillet2019TritonAI,
    title   = {Triton: an intermediate language and compiler for tiled neural network computations},
    author  = {Philippe Tillet and H. Kung and D. Cox},
    journal = {Proceedings of the 3rd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages},
    year    = {2019}
}

@misc{Schaipp2024,
    author  = {Fabian Schaipp},
    url     = {https://fabian-sp.github.io/posts/2024/02/decoupling/}
}

@inproceedings{Liang2024CautiousOI,
    title   = {Cautious Optimizers: Improving Training with One Line of Code},
    author  = {Kaizhao Liang and Lizhang Chen and Bo Liu and Qiang Liu},
    year    = {2024},
    url     = {https://api.semanticscholar.org/CorpusID:274234738}
}

@misc{chen2026cautiousweightdecay,
    title   = {Cautious Weight Decay}, 
    author  = {Lizhang Chen and Jonathan Li and Kaizhao Liang and Baiyu Su and Cong Xie and Nuo Wang Pierse and Chen Liang and Ni Lao and Qiang Liu},
    year    = {2026},
    eprint  = {2510.12402},
    archivePrefix = {arXiv},
    primaryClass = {cs.LG},
    url     = {https://arxiv.org/abs/2510.12402}, 
}

Project details

These details have not been verified by PyPI

Project links

Repository

Release history Release notifications | RSS feed

This version

0.2.5

Jul 9, 2026

0.2.4

Mar 4, 2026

0.2.3

Nov 27, 2024

0.2.2

Jun 15, 2024

0.2.1

Jun 15, 2024

0.2.0

Jun 15, 2024

0.1.4

Mar 30, 2024

0.1.2

May 10, 2023

0.1.0

May 9, 2023

0.0.8

May 9, 2023

0.0.7

Feb 22, 2023

0.0.6

Feb 17, 2023

0.0.5

Feb 17, 2023

0.0.4

Feb 15, 2023

0.0.3

Feb 15, 2023

0.0.2

Feb 15, 2023

0.0.1

Feb 15, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lion_pytorch-0.2.5.tar.gz (7.8 kB view details)

Uploaded Jul 9, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

lion_pytorch-0.2.5-py3-none-any.whl (10.0 kB view details)

Uploaded Jul 9, 2026 Python 3

File details

Details for the file lion_pytorch-0.2.5.tar.gz.

File metadata

Download URL: lion_pytorch-0.2.5.tar.gz
Upload date: Jul 9, 2026
Size: 7.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.8.17

File hashes

Hashes for lion_pytorch-0.2.5.tar.gz
Algorithm	Hash digest
SHA256	`fe8609a5a77fa8ba8399ac15921cb2cc041c2ef458d5522ff6eb143ec3e37e2d`
MD5	`87362917c1d8a9d6db3019c0b13910d3`
BLAKE2b-256	`fbbf3bfaa6474a6b8991ef70219293d895a273b0438c1db001c427abf7d95480`

See more details on using hashes here.

File details

Details for the file lion_pytorch-0.2.5-py3-none-any.whl.

File metadata

Download URL: lion_pytorch-0.2.5-py3-none-any.whl
Upload date: Jul 9, 2026
Size: 10.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.8.17

File hashes

Hashes for lion_pytorch-0.2.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`77089f28bda82f1cee829cabab8a104fe8aabf66af1941b0fc6ce400aabb28cc`
MD5	`6f516fd27c6e90791669ae0ebcfc8358`
BLAKE2b-256	`a27ac39e3ad6d7b71b47834a8f60378478d9a2e1812b66c5fb6158a41dc1b4b5`

See more details on using hashes here.

lion-pytorch 0.2.5

Navigation

Verified details

Project links

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

🦁 Lion - Pytorch

Instructions

Updates

Install

Usage

Appreciation

Citations

Project details

Verified details

Project links

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes