F5-TTS - MLX

These details have not been verified by PyPI

Project links

Homepage

Project description

F5 Quantized so smol

F5 TTS — MLX

This repo is a fork of original f5-tts-mlx implementation but a quantized flow-matching model that is only 223MB in size. The repo is meant to be used as a component of my blog post on low VRAM voice generator

Implementation of F5-TTS, with the MLX framework.

F5 TTS is a non-autoregressive, zero-shot text-to-speech system using a flow-matching mel spectrogram generator with a diffusion transformer (DiT).

This repo attempted to reduce the VRAM usage of the original model so that it can be easily deployed on any kind of Apple Device with ease (with MLX). The result as can be seen, is still very usable.

Demo:

4bit (223MB)

https://github.com/user-attachments/assets/406b4624-8f7c-48a4-a35d-2108fb081744

Original (1.35GB)

https://github.com/user-attachments/assets/c8b6f7c0-65ab-4950-ac96-b10608954174

Installation

pip install f5-tts-mlx-quantized

Basic Usage

python -m f5_tts_mlx.generate --text "The quick brown fox jumped over the lazy dog."

You can also use a pipe to generate speech from the output of another process, for instance from a language model:

mlx_lm.generate --model mlx-community/Llama-3.2-1B-Instruct-4bit --verbose false \
 --temp 0 --max-tokens 512 --prompt "Write a concise paragraph explaning wavelets." \
| python -m f5_tts_mlx.generate

Voice Matching

If you want to use your own reference audio sample, make sure it's a mono, 24kHz wav file of around 5-10 seconds:

python -m f5_tts_mlx.generate \
--text "The quick brown fox jumped over the lazy dog." \
--ref-audio /path/to/audio.wav \
--ref-text "This is the caption for the reference audio."

You can convert an audio file to the correct format with ffmpeg like this:

ffmpeg -i /path/to/audio.wav -ac 1 -ar 24000 -sample_fmt s16 -t 10 /path/to/output_audio.wav

See here for more options to customize generation.

From Python

You can load a pretrained model from Python:

from f5_tts_mlx.generate import generate

audio = generate(text = "Hello world.", ...)

Pretrained model weights are also available on Hugging Face.

Appreciation

Lucas Newman for original implementation of F5 TTS on MLX

Yushen Chen for the original Pytorch implementation of F5 TTS and pretrained model.

Phil Wang for the E2 TTS implementation that this model is based on.

Citations

@article{chen-etal-2024-f5tts,
      title={F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching}, 
      author={Yushen Chen and Zhikang Niu and Ziyang Ma and Keqi Deng and Chunhui Wang and Jian Zhao and Kai Yu and Xie Chen},
      journal={arXiv preprint arXiv:2410.06885},
      year={2024},
}

@inproceedings{Eskimez2024E2TE,
    title   = {E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS},
    author  = {Sefik Emre Eskimez and Xiaofei Wang and Manthan Thakker and Canrun Li and Chung-Hsien Tsai and Zhen Xiao and Hemin Yang and Zirun Zhu and Min Tang and Xu Tan and Yanqing Liu and Sheng Zhao and Naoyuki Kanda},
    year    = {2024},
    url     = {https://api.semanticscholar.org/CorpusID:270738197}
}

License

The code in this repository is released under the MIT license as found in the LICENSE file.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.1.1

Dec 13, 2024

0.1.0

Dec 13, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

f5_tts_mlx_quantized-0.1.1.tar.gz (237.0 kB view details)

Uploaded Dec 13, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

f5_tts_mlx_quantized-0.1.1-py3-none-any.whl (237.5 kB view details)

Uploaded Dec 13, 2024 Python 3

File details

Details for the file f5_tts_mlx_quantized-0.1.1.tar.gz.

File metadata

Download URL: f5_tts_mlx_quantized-0.1.1.tar.gz
Upload date: Dec 13, 2024
Size: 237.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.9.21

File hashes

Hashes for f5_tts_mlx_quantized-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`2be1b3a2149958c3e504e8087b58bdd27f3a8cd03ef24bce9f193b09c47587db`
MD5	`368ab1b7988651956358a9593f85715c`
BLAKE2b-256	`54a2959a0fb5f29edc3a275c778e997ad450052b9464198fb55dd525efd76f42`

See more details on using hashes here.

File details

Details for the file f5_tts_mlx_quantized-0.1.1-py3-none-any.whl.

File metadata

Download URL: f5_tts_mlx_quantized-0.1.1-py3-none-any.whl
Upload date: Dec 13, 2024
Size: 237.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.9.21

File hashes

Hashes for f5_tts_mlx_quantized-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`36c9192b2ccc822d33beec319b22c2c23b7dd91cd67747a4ccc034bc64eaaa06`
MD5	`dc9f620e598fe5e1212818eb323aabf5`
BLAKE2b-256	`bd6a29c7faaca0ecf4a5a395015a588155a72dda5610395bd61b9e16d4f505e4`

See more details on using hashes here.

f5-tts-mlx-quantized 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

F5 TTS — MLX

Demo:

4bit (223MB)

Original (1.35GB)

Installation

Basic Usage

Voice Matching

From Python

Appreciation

Citations

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes