Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"
Project description
Unofficial implementation for the paper "Mixture-of-Depths"
Introduction
This is an unofficial implementation for the paper Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
Currently supported models
Model | Supported? |
---|---|
Mistral | ✅ |
Mixtral | ❌ |
LLama2 | ✅ |
Gemma | ✅ |
Solar | ❌ |
💾 Installation
pip install mixture-of-depth
Both Linux, Windows and MacOS are supported.
🏁 Quick Start
High-level API (tranformers-compatible)
from transformers import AutoModelForCausalLM
from MoD import apply_mod_to_hf
# Initialize your model from an available hf model
model= AutoModelForCausalLM.from_pretrained("some-repo/some-model")
# Convert the model to include the mixture of depths layers
model = apply_mod_to_hf(model)
# train the model
# ...
# save the model
model.save_pretrained('some_local_directory')
Loading the converted Model
To utilize the converted model, you will need to load the model from the AutoClass. Below is an example demonstrating how to load the model from a local directory:
from MoD import AutoMoDModelForCausalLM
# Replace 'path_to_your_model' with the actual path to your model's directory
model = AutoMoDModelForCausalLM.from_pretrained('path_to_your_model')
🫱🏼🫲🏽 Contributing
We welcome contributions from the community, whether it's adding new features, improving documentation, or reporting bugs. Please refer to our contribution guidelines before making a pull request.
📜 License
This repo is open-sourced under the Apache-2.0 license.
Citation
If you use our code in your research, please cite it using the following Bibtex entry:
@article{MoD2024,
title={Unofficial implementation for the paper "Mixture-of-Depths"},
author={AstraMind AI},
journal={https://github.com/astramind-ai/Mixture-of-depths},
year={2024}
}
Support
For questions, issues, or support, please open an issue on our GitHub repository.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for mixture_of_depth-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2b086e65be1f7b3dc06e5946eacb130887f249a5c8f3b938391c549cb474d892 |
|
MD5 | 56501a1c4273649fff804f7f1179ef08 |
|
BLAKE2b-256 | dcf9b11ebd1a455e4fd5093717b81f238ff5d2cb66035f4eeeef3e71257ada1b |