E2-TTS in Pytorch
Project description
E2 TTS - Pytorch
Implementation of E2-TTS, Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS, in Pytorch
The repository differs from the paper in that it uses a multistream transformer for text and audio, with conditioning done every transformer block in the E2 manner.
Appreciation
-
Manmay for contributing working end-to-end training code!
-
Lucas Newman for the code contributions, helpful feedback, and for sharing the first set of positive experiments!
-
Jing for sharing the second positive result with a multilingual (English + Chinese) dataset!
-
Coice and Manmay for reporting the third and fourth successful runs. Farewell alignment engineering
Install
$ pip install e2-tts-pytorch
Usage
import torch
from e2_tts_pytorch import (
E2TTS,
DurationPredictor
)
duration_predictor = DurationPredictor(
transformer = dict(
dim = 512,
depth = 8,
)
)
mel = torch.randn(2, 1024, 100)
text = ['Hello', 'Goodbye']
loss = duration_predictor(mel, text = text)
loss.backward()
e2tts = E2TTS(
duration_predictor = duration_predictor,
transformer = dict(
dim = 512,
depth = 8
),
)
out = e2tts(mel, text = text)
out.loss.backward()
sampled = e2tts.sample(mel[:, :5], text = text)
Citations
@inproceedings{Eskimez2024E2TE,
title = {E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS},
author = {Sefik Emre Eskimez and Xiaofei Wang and Manthan Thakker and Canrun Li and Chung-Hsien Tsai and Zhen Xiao and Hemin Yang and Zirun Zhu and Min Tang and Xu Tan and Yanqing Liu and Sheng Zhao and Naoyuki Kanda},
year = {2024},
url = {https://api.semanticscholar.org/CorpusID:270738197}
}
@inproceedings{Darcet2023VisionTN,
title = {Vision Transformers Need Registers},
author = {Timoth'ee Darcet and Maxime Oquab and Julien Mairal and Piotr Bojanowski},
year = {2023},
url = {https://api.semanticscholar.org/CorpusID:263134283}
}
@article{Bao2022AllAW,
title = {All are Worth Words: A ViT Backbone for Diffusion Models},
author = {Fan Bao and Shen Nie and Kaiwen Xue and Yue Cao and Chongxuan Li and Hang Su and Jun Zhu},
journal = {2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2022},
pages = {22669-22679},
url = {https://api.semanticscholar.org/CorpusID:253581703}
}
@article{Burtsev2021MultiStreamT,
title = {Multi-Stream Transformers},
author = {Mikhail S. Burtsev and Anna Rumshisky},
journal = {ArXiv},
year = {2021},
volume = {abs/2107.10342},
url = {https://api.semanticscholar.org/CorpusID:236171087}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for e2_tts_pytorch-1.0.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e97e3214e94e02e4a35007d3b5ed0555c831bff06cf183b93ee32ab3f67fba46 |
|
MD5 | 5183026017035959c9eb3b8497063be1 |
|
BLAKE2b-256 | fc1851d4562c219840bc9407c6cddcf38f34daf3b53634ddada5afadbbc3366d |