Skip to main content

Cutting transformers layers

Project description

:scissors: Short Transformers

pypi Version Code style: black

Installation:

pip install short-transformers

Quickstart:

from short_transformers import ShortTransformer

# load from path/hf_hub
model = ShortTransformer.from_pretrained(model_name)

# or use hf model
model = ShortTransformer.from_model(hf_model)

# remove n layers, use hf dataset to find the optimal cut
short_model = model.remove_layers(n=5, dataset) # (n, dataset, key, limit, batch_size, return_outputs, distance)

# continue training to heal after the cut
# ...

# save as hf model
short_mdoel.save_pretrained(output_path)

Both short model and saved model are fully compatible with transformers.

Supported pruning methods:

  • based on layer input/output distances:

    • angular distance of the last token (original)
    • averaged angular distances of all tokens
  • based on layer linear replacement loss

Citing

If you use Short Transformers in your research, please cite with the following BibText

@misc{russak2024shorttransformers,
    title  = {ShortTransformers, optimal layer pruning tools},
    author = {Melisa Russak},
    url    = {https://github.com/melisa/short-transformers},
    year   = {2024}
}
@misc{gromov2024unreasonable,
      title={The Unreasonable Ineffectiveness of the Deeper Layers}, 
      author={Andrey Gromov and Kushal Tirumala and Hassan Shapourian and Paolo Glorioso and Daniel A. Roberts},
      year={2024},
      eprint={2403.17887},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

short_transformers-0.1.0.tar.gz (4.4 kB view hashes)

Uploaded Source

Built Distribution

short_transformers-0.1.0-py3-none-any.whl (5.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page