Last released Jun 20, 2024
Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"
Last released Apr 21, 2024
An efficent implementation for the paper: "The Era of 1-bit LLMs"
Supported by