Skip to main content

F5-TTS - MLX

Project description

F5 TTS diagram

F5 TTS — MLX

Implementation of F5-TTS, with the MLX framework.

F5 TTS is a non-autoregressive, zero-shot text-to-speech system using a flow-matching mel spectrogram generator with a diffusion transformer (DiT).

You can listen to a sample here that was generated in ~11 seconds on an M3 Max MacBook Pro.

F5 is an evolution of E2 TTS and improves performance with ConvNeXT v2 blocks for the learned text alignment. This repository is based on the original Pytorch implementation available here.

Installation

pip install f5-tts-mlx

Usage

python -m f5_tts_mlx.generate --text "The quick brown fox jumped over the lazy dog."

If you want to use your own reference audio sample, make sure it's a mono, 24kHz wav file of around 5-10 seconds:

python -m f5_tts_mlx.generate \
--text "The quick brown fox jumped over the lazy dog."
--ref-audio /path/to/audio.wav
--ref-text "This is the caption for the reference audio."

You can convert an audio file to the correct format with ffmpeg like this:

ffmpeg -i /path/to/audio.wav -ac 1 -ar 24000 -sample_fmt s16 -t 10 /path/to/output_audio.wav

See here for more options to customize generation.

You can load a pretrained model from Python like this:

from f5_tts_mlx.generate import generate

audio = generate(text = "Hello world.", ...)

Pretrained model weights are also available on Hugging Face.

Appreciation

Yushen Chen for the original Pytorch implementation of F5 TTS and pretrained model.

Phil Wang for the E2 TTS implementation that this model is based on.

Citations

@article{chen-etal-2024-f5tts,
      title={F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching}, 
      author={Yushen Chen and Zhikang Niu and Ziyang Ma and Keqi Deng and Chunhui Wang and Jian Zhao and Kai Yu and Xie Chen},
      journal={arXiv preprint arXiv:2410.06885},
      year={2024},
}
@inproceedings{Eskimez2024E2TE,
    title   = {E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS},
    author  = {Sefik Emre Eskimez and Xiaofei Wang and Manthan Thakker and Canrun Li and Chung-Hsien Tsai and Zhen Xiao and Hemin Yang and Zirun Zhu and Min Tang and Xu Tan and Yanqing Liu and Sheng Zhao and Naoyuki Kanda},
    year    = {2024},
    url     = {https://api.semanticscholar.org/CorpusID:270738197}
}

License

The code in this repository is released under the MIT license as found in the LICENSE file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

f5_tts_mlx-0.1.7.tar.gz (237.0 kB view details)

Uploaded Source

Built Distribution

f5_tts_mlx-0.1.7-py3-none-any.whl (237.2 kB view details)

Uploaded Python 3

File details

Details for the file f5_tts_mlx-0.1.7.tar.gz.

File metadata

  • Download URL: f5_tts_mlx-0.1.7.tar.gz
  • Upload date:
  • Size: 237.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for f5_tts_mlx-0.1.7.tar.gz
Algorithm Hash digest
SHA256 798b915987efb8da657098a94f5e363ea595fb89d639573a365df2532a0f5060
MD5 34133a47c915bd63c18ab5f169a05b4a
BLAKE2b-256 fa21f60dfdf3cd8e7bc611afbcaa34c9adb1029a9d507177aff5ee15121ee934

See more details on using hashes here.

File details

Details for the file f5_tts_mlx-0.1.7-py3-none-any.whl.

File metadata

  • Download URL: f5_tts_mlx-0.1.7-py3-none-any.whl
  • Upload date:
  • Size: 237.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for f5_tts_mlx-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 55ae6039d8c97d018b3610b0c7fa7186cb9623546ca586e483e4a032feca624e
MD5 79af8a8dd5a379dd2b889281963962ba
BLAKE2b-256 e87964875671cb8f1cb8aa53a2a4fbe9dada18958505ccc8167d4e9ffba1b68b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page