Skip to main content

F5-TTS - MLX

Project description

F5 TTS diagram

F5 TTS — MLX

Implementation of F5-TTS, with the MLX framework.

F5 TTS is a non-autoregressive, zero-shot text-to-speech system using a flow-matching mel spectrogram generator with a diffusion transformer (DiT).

You can listen to a sample here that was generated in ~11 seconds on an M3 Max MacBook Pro.

F5 is an evolution of E2 TTS and improves performance with ConvNeXT v2 blocks for the learned text alignment. This repository is based on the original Pytorch implementation available here.

Installation

pip install f5-tts-mlx

Pretrained model weights are available on Hugging Face.

You'll also need a vocabulary (see the data folder) and mel filterbanks (see the assets folder) to preprocess the text & audio before generating speech.

Usage

See examples/generate.py for an example of generation.

You can load a pretrained model from Python like this:

import mlx.core as mx

from f5_tts_mlx.cfm import CFM

vocab = ...
f5tts = CFM.from_pretrained("lucasnewman/f5-tts-mlx", vocab)
audio = f5tts.sample(...)

Appreciation

Yushen Chen for the original Pytorch implementation of F5 TTS and pretrained model.

Phil Wang for the E2 TTS implementation that this model is based on.

Citations

@article{chen-etal-2024-f5tts,
      title={F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching}, 
      author={Yushen Chen and Zhikang Niu and Ziyang Ma and Keqi Deng and Chunhui Wang and Jian Zhao and Kai Yu and Xie Chen},
      journal={arXiv preprint arXiv:2410.06885},
      year={2024},
}
@inproceedings{Eskimez2024E2TE,
    title   = {E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS},
    author  = {Sefik Emre Eskimez and Xiaofei Wang and Manthan Thakker and Canrun Li and Chung-Hsien Tsai and Zhen Xiao and Hemin Yang and Zirun Zhu and Min Tang and Xu Tan and Yanqing Liu and Sheng Zhao and Naoyuki Kanda},
    year    = {2024},
    url     = {https://api.semanticscholar.org/CorpusID:270738197}
}

License

The code in this repository is released under the MIT license as found in the LICENSE file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

f5_tts_mlx-0.0.4.tar.gz (15.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

f5_tts_mlx-0.0.4-py3-none-any.whl (15.0 kB view details)

Uploaded Python 3

File details

Details for the file f5_tts_mlx-0.0.4.tar.gz.

File metadata

  • Download URL: f5_tts_mlx-0.0.4.tar.gz
  • Upload date:
  • Size: 15.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for f5_tts_mlx-0.0.4.tar.gz
Algorithm Hash digest
SHA256 6e700b2c011d8bf53ec8d15c25b8dfc12c1c2f9286efc0e27261ee53b9834e2a
MD5 379116b1da23fd540d2494a6b0faedce
BLAKE2b-256 d4ea1912a951e7a2ed64e8f1da6ff6b4b1d527762f8370598745506c6d7b8065

See more details on using hashes here.

File details

Details for the file f5_tts_mlx-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: f5_tts_mlx-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 15.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for f5_tts_mlx-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 16ba2470ac1cf739248b83432afbf11f571df5dea25131570aff395f7d3964e6
MD5 b7c0edaaffdc619ad38ddf9f101e737f
BLAKE2b-256 7716b3b5df210f524fca9e371162a711906a271b6485c5b0d8b573fd8599c29e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page