Skip to main content

Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"

Project description

Unofficial implementation for the paper "Mixture-of-Depths"

Introduction

This is an unofficial implementation for the paper Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Currently supported models

Model Supported?
Mistral
Mixtral
LLama
LLama2
Gemma
BLOOMZ
BLOOM
DeepSeek
Phi (1.5 & 2)
Qwen2
StarCoder2
Qwen2-MoE
Solar
Baichuan
ChatGLM3
InternLM
Olmo
XVERSE
Yi
Yuan

💾 Installation

pip install mixture-of-depth

Both Linux, Windows and MacOS are supported.

🏁 Quick Start

High-level API (tranformers-compatible)

from transformers import AutoModelForCausalLM
from MoD import apply_mod_to_hf

# Initialize your model from an available hf model
model= AutoModelForCausalLM.from_pretrained("some-repo/some-model")
# Convert the model to include the mixture of depths layers
model = apply_mod_to_hf(model)
# train the model
# ...
# save the model
model.save_pretrained('some_local_directory')

Loading the converted Model

To utilize the converted model, you will need to load the model from the AutoClass. Below is an example demonstrating how to load the model from a local directory:

from MoD import AutoMoDModelForCausalLM

# Replace 'path_to_your_model' with the actual path to your model's directory
model = AutoMoDModelForCausalLM.from_pretrained('path_to_your_model')

🫱🏼‍🫲🏽 Contributing

We welcome contributions from the community, whether it's adding new features, improving documentation, or reporting bugs. Please refer to our contribution guidelines before making a pull request.

📜 License

This repo is open-sourced under the Apache-2.0 license.

Citation

If you use our code in your research, please cite it using the following Bibtex entry:

@article{MoD2024,
  title={Unofficial implementation for the paper "Mixture-of-Depths"},
  author={AstraMind AI},
  journal={https://github.com/astramind-ai/Mixture-of-depths},
  year={2024}
}

Support

For questions, issues, or support, please open an issue on our GitHub repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mixture-of-depth-1.1.6.tar.gz (144.3 kB view details)

Uploaded Source

Built Distribution

mixture_of_depth-1.1.6-py3-none-any.whl (152.8 kB view details)

Uploaded Python 3

File details

Details for the file mixture-of-depth-1.1.6.tar.gz.

File metadata

  • Download URL: mixture-of-depth-1.1.6.tar.gz
  • Upload date:
  • Size: 144.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 colorama/0.4.4 importlib-metadata/4.6.4 keyring/23.5.0 pkginfo/1.8.2 readme-renderer/34.0 requests-toolbelt/0.9.1 requests/2.25.1 rfc3986/1.5.0 tqdm/4.57.0 urllib3/1.26.5 CPython/3.10.12

File hashes

Hashes for mixture-of-depth-1.1.6.tar.gz
Algorithm Hash digest
SHA256 a5b4b8cccccfc0ff4d9a86c936ea9026a7eb733df31da0ab595791c66fd34b0b
MD5 88b5763c595875fc9cd080136848c6f7
BLAKE2b-256 07e11d55ab16f03266acb648d462d5e2de772700ec43296ab4f86de17ff79c1b

See more details on using hashes here.

File details

Details for the file mixture_of_depth-1.1.6-py3-none-any.whl.

File metadata

  • Download URL: mixture_of_depth-1.1.6-py3-none-any.whl
  • Upload date:
  • Size: 152.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 colorama/0.4.4 importlib-metadata/4.6.4 keyring/23.5.0 pkginfo/1.8.2 readme-renderer/34.0 requests-toolbelt/0.9.1 requests/2.25.1 rfc3986/1.5.0 tqdm/4.57.0 urllib3/1.26.5 CPython/3.10.12

File hashes

Hashes for mixture_of_depth-1.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 c78cb95464a3b917b79716383fb81209e0645d6ae50ca05896c4cf61fe53152f
MD5 5187a974b7d12fa9acadb6bd76192381
BLAKE2b-256 6bc5d0981d669d5675d5687b37cb3513f68b9b8ca857341300dd6d41839d4b70

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page