Cutting transformers layers
Project description
:scissors: Short Transformers
- Pytorch implementation of layer pruning proposed in The Unreasonable Ineffectiveness of the Deeper Layers.
- The repository reproduces and extends original methods by offering different layer pruning criteria.
Installation:
pip install short-transformers
Quickstart:
from short_transformers import ShortTransformer
# load from path/hf_hub
model = ShortTransformer.from_pretrained(model_name)
# or use hf model
model = ShortTransformer.from_model(hf_model)
# remove n layers, use hf dataset to find the optimal cut
short_model = model.remove_layers(n=5, dataset) # (n, dataset, key, limit, batch_size, return_outputs, distance)
# continue training to heal after the cut
# ...
# save as hf model
short_mdoel.save_pretrained(output_path)
Both short model and saved model are fully compatible with transformers.
Supported pruning methods:
-
based on layer input/output distances:
- angular distance of the last token (original)
- averaged angular distances of all tokens
-
based on layer linear replacement loss
Citing
If you use Short Transformers in your research, please cite with the following BibText
@misc{russak2024shorttransformers,
title = {ShortTransformers, optimal layer pruning tools},
author = {Melisa Russak},
url = {https://github.com/melisa/short-transformers},
year = {2024}
}
@misc{gromov2024unreasonable,
title={The Unreasonable Ineffectiveness of the Deeper Layers},
author={Andrey Gromov and Kushal Tirumala and Hassan Shapourian and Paolo Glorioso and Daniel A. Roberts},
year={2024},
eprint={2403.17887},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for short_transformers-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bdc58375da855c0f4eb5365a1dfad754298be67e52fe4eb880f8b0770c99e4bc |
|
MD5 | a21c2b6d54d60210e311700b35f511dd |
|
BLAKE2b-256 | 4bf5ec061806fa6f8f0086c35a5492a16451ee6082e2fd418f281e6b195444bb |