3 projects
auralis
This is a faster implementation for TTS models, to be used in highly async environment
mixture-of-depth
Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"
bitmat-tl
An efficent implementation for the paper: "The Era of 1-bit LLMs"