Skip to main content

E2-TTS - MLX

Project description

E2 TTS diagram

E2 TTS — MLX

Implementation of E2-TTS, Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS, with the MLX framework.

This implementation is based on the lucidrains implementation in Pytorch, which differs from the paper in that it uses a multistream transformer for text and audio, with conditioning done every transformer block.

Installation

pip install mlx-e2-tts

Usage

import mlx.core as mx

from e2_tts_mlx.model import E2TTS
from e2_tts_mlx.trainer import E2Trainer
from e2_tts_mlx.data import load_libritts_r

e2tts = E2TTS(
    tokenizer="char-utf8",  # or "phoneme_en"
    cond_drop_prob = 0.25,
    frac_lengths_mask = (0.7, 0.9),
    transformer = dict(
        dim = 1024,
        depth = 24,
        heads = 16,
        text_depth = 12,
        text_heads = 8,
        text_ff_mult = 4,
        max_seq_len = 4096,
        dropout = 0.1
    )
)
mx.eval(e2tts.parameters())

batch_size = 128
max_duration = 30

dataset = load_libritts_r(split="dev-clean")  # or any audio/caption dataset

trainer = E2Trainer(model = e2tts, num_warmup_steps = 1000)

trainer.train(
    train_dataset = ...,
    learning_rate = 7.5e-5,
    batch_size = batch_size,
    total_steps = 1_000_000
)

... after much training ...

cond = ...
text = ...
duration = ...  # from a trained DurationPredictor or otherwise

generated_audio = e2tts.sample(
    cond = cond,
    text = text,
    duration = duration,
    steps = 32,
    cfg_strength = 1.0,  # if trained for cfg
    use_vocos = True  # set to False to get mel spectrograms instead of audio
)

Note the model size specified above (from the paper) is very large. See train_example.py for a more practical-sized model you can train on your local device.

Appreciation

lucidrains for the original implementation in Pytorch.

Citations

@inproceedings{Eskimez2024E2TE,
    title   = {E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS},
    author  = {Sefik Emre Eskimez and Xiaofei Wang and Manthan Thakker and Canrun Li and Chung-Hsien Tsai and Zhen Xiao and Hemin Yang and Zirun Zhu and Min Tang and Xu Tan and Yanqing Liu and Sheng Zhao and Naoyuki Kanda},
    year    = {2024},
    url     = {https://api.semanticscholar.org/CorpusID:270738197}
}
@article{Burtsev2021MultiStreamT,
    title     = {Multi-Stream Transformers},
    author    = {Mikhail S. Burtsev and Anna Rumshisky},
    journal   = {ArXiv},
    year      = {2021},
    volume    = {abs/2107.10342},
    url       = {https://api.semanticscholar.org/CorpusID:236171087}
}

License

The code in this repository is released under the MIT license as found in the LICENSE file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlx_e2_tts-0.0.6.tar.gz (17.8 kB view details)

Uploaded Source

Built Distribution

mlx_e2_tts-0.0.6-py3-none-any.whl (17.0 kB view details)

Uploaded Python 3

File details

Details for the file mlx_e2_tts-0.0.6.tar.gz.

File metadata

  • Download URL: mlx_e2_tts-0.0.6.tar.gz
  • Upload date:
  • Size: 17.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for mlx_e2_tts-0.0.6.tar.gz
Algorithm Hash digest
SHA256 93e164b16146a0a5ccc7db43c941ca080058f852f89a471b72bb081a6c65c97e
MD5 0db4f558c49de203bc010db4465fc6a3
BLAKE2b-256 5ba81d61fd0555c1d232df79dcadf8bf7e9c55020135460581db0ef1c64d0964

See more details on using hashes here.

File details

Details for the file mlx_e2_tts-0.0.6-py3-none-any.whl.

File metadata

  • Download URL: mlx_e2_tts-0.0.6-py3-none-any.whl
  • Upload date:
  • Size: 17.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for mlx_e2_tts-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 5520776ab255c706f730994d6681032ff2a17f13bf3ecd5a36296b59222878d6
MD5 2d9a5be3e25c756132ba70039c28bc39
BLAKE2b-256 ffee27765574e4759efa0e5d6b0cba36586ee9f790733b193b57eb90eaeefd69

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page