Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"
Project description
Unofficial implementation for the paper "Mixture-of-Depths"
Introduction
This is an unofficial implementation for the paper Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
Currently supported models
Model | Supported? |
---|---|
Mistral | ✅ |
Mixtral | ✅ |
LLama | ✅ |
LLama2 | ✅ |
Gemma | ✅ |
BLOOMZ | ✅ |
BLOOM | ✅ |
DeepSeek | ✅ |
Phi (1.5 & 2) | ✅ |
Qwen2 | ✅ |
StarCoder2 | ✅ |
Qwen2-MoE | ❓ |
Solar | ❓ |
Baichuan | ❌ |
ChatGLM3 | ❌ |
InternLM | ❌ |
Olmo | ❌ |
XVERSE | ❌ |
Yi | ❌ |
Yuan | ❌ |
💾 Installation
pip install mixture-of-depth
Both Linux, Windows and MacOS are supported.
🏁 Quick Start
High-level API (tranformers-compatible)
from transformers import AutoModelForCausalLM
from MoD import apply_mod_to_hf
# Initialize your model from an available hf model
model= AutoModelForCausalLM.from_pretrained("some-repo/some-model")
# Convert the model to include the mixture of depths layers
model = apply_mod_to_hf(model)
# train the model
# ...
# save the model
model.save_pretrained('some_local_directory')
Loading the converted Model
To utilize the converted model, you will need to load the model from the AutoClass. Below is an example demonstrating how to load the model from a local directory:
from MoD import AutoMoDModelForCausalLM
# Replace 'path_to_your_model' with the actual path to your model's directory
model = AutoMoDModelForCausalLM.from_pretrained('path_to_your_model')
🫱🏼🫲🏽 Contributing
We welcome contributions from the community, whether it's adding new features, improving documentation, or reporting bugs. Please refer to our contribution guidelines before making a pull request.
📜 License
This repo is open-sourced under the Apache-2.0 license.
Citation
If you use our code in your research, please cite it using the following Bibtex entry:
@article{MoD2024,
title={Unofficial implementation for the paper "Mixture-of-Depths"},
author={AstraMind AI},
journal={https://github.com/astramind-ai/Mixture-of-depths},
year={2024}
}
Support
For questions, issues, or support, please open an issue on our GitHub repository.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for mixture_of_depth-1.1.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | edb3c2c1838ba6a7b424cab5a147e9efd6c6538af31a286cb7615a4b5ea39eda |
|
MD5 | 1cef16089225f7daa298dc732efdddf8 |
|
BLAKE2b-256 | 376c545df0a1d56d621907884b28897be1c557d266ec60a075046ed41ea36304 |