E2-TTS - MLX
Project description
E2 TTS — MLX
Implementation of E2-TTS, Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS, with the MLX framework.
This implementation is based on the lucidrains implementation in Pytorch, which differs from the paper in that it uses a multistream transformer for text and audio, with conditioning done every transformer block.
Usage
import mlx.core as mx
from e2_tts_mlx.model import E2TTS
from e2_tts_mlx.trainer import E2Trainer
e2tts = E2TTS(
tokenizer="char-utf8", # or "phoneme_en" for phoneme-based tokenization
cond_drop_prob = 0.25,
frac_lengths_mask = (0.7, 0.9),
transformer = dict(
dim = 1024,
depth = 24,
heads = 16,
text_depth = 12,
text_heads = 8,
text_ff_mult = 4,
max_seq_len = 4096,
dropout = 0.1
)
)
mx.eval(e2tts.parameters())
batch_size = 128
max_duration = 30
dataset = load_libritts_r(split="dev-clean", max_duration = max_duration)
trainer = E2Trainer(model = e2tts, num_warmup_steps = 1000)
trainer.train(train_dataset = dataset, learning_rate = 7.5e-5, batch_size = batch_size)
Appreciation
lucidrains for the original implementation in Pytorch.
Citations
@inproceedings{Eskimez2024E2TE,
title = {E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS},
author = {Sefik Emre Eskimez and Xiaofei Wang and Manthan Thakker and Canrun Li and Chung-Hsien Tsai and Zhen Xiao and Hemin Yang and Zirun Zhu and Min Tang and Xu Tan and Yanqing Liu and Sheng Zhao and Naoyuki Kanda},
year = {2024},
url = {https://api.semanticscholar.org/CorpusID:270738197}
}
@article{Burtsev2021MultiStreamT,
title = {Multi-Stream Transformers},
author = {Mikhail S. Burtsev and Anna Rumshisky},
journal = {ArXiv},
year = {2021},
volume = {abs/2107.10342},
url = {https://api.semanticscholar.org/CorpusID:236171087}
}
License
The code in this repository is released under the MIT license as found in the LICENSE file.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
mlx_e2_tts-0.0.1.tar.gz
(16.6 kB
view hashes)
Built Distribution
mlx_e2_tts-0.0.1-py3-none-any.whl
(16.1 kB
view hashes)
Close
Hashes for mlx_e2_tts-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5529a03d91f74665545384aa4cf2f9c8a995c4c1bdda988ae3fe733c1038f449 |
|
MD5 | c2df6b44507a5005c823f07ea30b8ed4 |
|
BLAKE2b-256 | 3f99a86095171a1390b93350834ceeedba9fb2a21342ad5280cd3119d514c45a |