Skip to main content

Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"

Project description

Unofficial implementation for the paper "Mixture-of-Depths"

Introduction

This is an unofficial implementation for the paper Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Currently supported models

Model Supported?
Mistral
Mixtral
LLama
LLama2
LLama3
Gemma
BLOOMZ
BLOOM
DeepSeek
Phi (1.5 & 2)
Qwen2
StarCoder2
Qwen2-MoE
Solar
Baichuan
ChatGLM3
InternLM
Olmo
XVERSE
Yi
Yuan

💾 Installation

pip install mixture-of-depth

Both Linux, Windows and MacOS are supported.

🏁 Quick Start

High-level API (tranformers-compatible)

from transformers import AutoModelForCausalLM
from MoD import apply_mod_to_hf

# Initialize your model from an available hf model
model= AutoModelForCausalLM.from_pretrained("some-repo/some-model")
# Convert the model to include the mixture of depths layers
model = apply_mod_to_hf(model)
# train the model
# ...
# save the model
model.save_pretrained('some_local_directory')

Loading the converted Model

To utilize the converted model, you will need to load the model from the AutoClass. Below is an example demonstrating how to load the model from a local directory:

from MoD import AutoMoDModelForCausalLM

# Replace 'path_to_your_model' with the actual path to your model's directory
model = AutoMoDModelForCausalLM.from_pretrained('path_to_your_model')

Using generate()

Before calling the hf generate() method please explicitly use eval() on the model

🫱🏼‍🫲🏽 Contributing

We welcome contributions from the community, whether it's adding new features, improving documentation, or reporting bugs. Please refer to our contribution guidelines before making a pull request.

📜 License

This repo is open-sourced under the Apache-2.0 license.

Citation

If you use our code in your research, please cite it using the following Bibtex entry:

@article{MoD2024,
  title={Unofficial implementation for the paper "Mixture-of-Depths"},
  author={AstraMind AI},
  journal={https://github.com/astramind-ai/Mixture-of-depths},
  year={2024}
}

Support

For questions, issues, or support, please open an issue on our GitHub repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mixture-of-depth-1.2.0.tar.gz (144.4 kB view details)

Uploaded Source

Built Distribution

mixture_of_depth-1.2.0-py3-none-any.whl (153.0 kB view details)

Uploaded Python 3

File details

Details for the file mixture-of-depth-1.2.0.tar.gz.

File metadata

  • Download URL: mixture-of-depth-1.2.0.tar.gz
  • Upload date:
  • Size: 144.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 colorama/0.4.4 importlib-metadata/4.6.4 keyring/23.5.0 pkginfo/1.8.2 readme-renderer/34.0 requests-toolbelt/0.9.1 requests/2.25.1 rfc3986/1.5.0 tqdm/4.57.0 urllib3/1.26.5 CPython/3.10.12

File hashes

Hashes for mixture-of-depth-1.2.0.tar.gz
Algorithm Hash digest
SHA256 dbeca1aed085a6a86ae73cff0ace6c38f734cffc43f67b2329f941d180990c6a
MD5 ee58e4731f35180a72297b4853457e5a
BLAKE2b-256 ccbf5969e91947aca412f191fbf9bcf20de5334e6827bedd1bb8a578ef90f22a

See more details on using hashes here.

File details

Details for the file mixture_of_depth-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: mixture_of_depth-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 153.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 colorama/0.4.4 importlib-metadata/4.6.4 keyring/23.5.0 pkginfo/1.8.2 readme-renderer/34.0 requests-toolbelt/0.9.1 requests/2.25.1 rfc3986/1.5.0 tqdm/4.57.0 urllib3/1.26.5 CPython/3.10.12

File hashes

Hashes for mixture_of_depth-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e6801e9ed914f16e0327c05b597e00ff22cf0dd010f7ca732850200f5548ecb0
MD5 16be8593cb317b0dd57d93ecc2a5b364
BLAKE2b-256 f893c584b847747cb7946ffd0e716d79a0de2e91862746bcba5bc930f82c784a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page