F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
Project description
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
F5-TTS: Diffusion Transformer with ConvNeXt V2, faster trained and inference.
E2 TTS: Flat-UNet Transformer, closest reproduction from paper.
Sway Sampling: Inference-time flow step sampling strategy, greatly improves performance
Thanks to all the contributors !
News
- 2025/03/12: 🔥 F5-TTS v1 base model with better training and inference performance. Few demo.
- 2024/10/08: F5-TTS & E2 TTS base models on 🤗 Hugging Face, 🤖 Model Scope, 🟣 Wisemodel.
Installation
Create a separate environment if needed
# Create a conda env with python_version>=3.10 (you could also use virtualenv)
conda create -n f5-tts python=3.11
conda activate f5-tts
# Install FFmpeg if you haven't yet
conda install ffmpeg
Install PyTorch with matched device
NVIDIA GPU
# Install pytorch with your CUDA version, e.g. pip install torch==2.8.0+cu128 torchaudio==2.8.0+cu128 --extra-index-url https://download.pytorch.org/whl/cu128 # And also possible previous versions, e.g. pip install torch==2.4.0+cu124 torchaudio==2.4.0+cu124 --extra-index-url https://download.pytorch.org/whl/cu124 # etc.
AMD GPU
# Install pytorch with your ROCm version (Linux only), e.g. pip install torch==2.5.1+rocm6.2 torchaudio==2.5.1+rocm6.2 --extra-index-url https://download.pytorch.org/whl/rocm6.2
Intel GPU
# Install pytorch with your XPU version, e.g. # Intel® Deep Learning Essentials or Intel® oneAPI Base Toolkit must be installed pip install torch torchaudio --index-url https://download.pytorch.org/whl/test/xpu # Intel GPU support is also available through IPEX (Intel® Extension for PyTorch) # IPEX does not require the Intel® Deep Learning Essentials or Intel® oneAPI Base Toolkit # See: https://pytorch-extension.intel.com/installation?request=platform
Apple Silicon
# Install the stable pytorch, e.g. pip install torch torchaudio
Then you can choose one from below:
1. As a pip package (if just for inference)
pip install f5-tts2. Local editable (if also do training, finetuning)
git clone https://github.com/SWivid/F5-TTS.git cd F5-TTS # git submodule update --init --recursive # (optional, if use bigvgan as vocoder) pip install -e .
Docker usage also available
# Build from Dockerfile
docker build -t f5tts:v1 .
# Run from GitHub Container Registry
docker container run --rm -it --gpus=all --mount 'type=volume,source=f5-tts,target=/root/.cache/huggingface/hub/' -p 7860:7860 ghcr.io/swivid/f5-tts:main
# Quickstart if you want to just run the web interface (not CLI)
docker container run --rm -it --gpus=all --mount 'type=volume,source=f5-tts,target=/root/.cache/huggingface/hub/' -p 7860:7860 ghcr.io/swivid/f5-tts:main f5-tts_infer-gradio --host 0.0.0.0
Runtime
Deployment solution with Triton and TensorRT-LLM.
Benchmark Results
Decoding on a single L20 GPU, using 26 different prompt_audio & target_text pairs, 16 NFE.
| Model | Concurrency | Avg Latency | RTF | Mode |
|---|---|---|---|---|
| F5-TTS Base (Vocos) | 2 | 253 ms | 0.0394 | Client-Server |
| F5-TTS Base (Vocos) | 1 (Batch_size) | - | 0.0402 | Offline TRT-LLM |
| F5-TTS Base (Vocos) | 1 (Batch_size) | - | 0.1467 | Offline Pytorch |
See detailed instructions for more information.
Inference
- In order to achieve desired performance, take a moment to read detailed guidance.
- By properly searching the keywords of problem encountered, issues are very helpful.
1. Gradio App
Currently supported features:
- Basic TTS with Chunk Inference
- Multi-Style / Multi-Speaker Generation
- Voice Chat powered by Qwen2.5-3B-Instruct
- Custom inference with more language support
# Launch a Gradio app (web interface)
f5-tts_infer-gradio
# Specify the port/host
f5-tts_infer-gradio --port 7860 --host 0.0.0.0
# Launch a share link
f5-tts_infer-gradio --share
NVIDIA device docker compose file example
services:
f5-tts:
image: ghcr.io/swivid/f5-tts:main
ports:
- "7860:7860"
environment:
GRADIO_SERVER_PORT: 7860
entrypoint: ["f5-tts_infer-gradio", "--port", "7860", "--host", "0.0.0.0"]
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
volumes:
f5-tts:
driver: local
2. CLI Inference
# Run with flags
# Leave --ref_text "" will have ASR model transcribe (extra GPU memory usage)
f5-tts_infer-cli --model F5TTS_v1_Base \
--ref_audio "provide_prompt_wav_path_here.wav" \
--ref_text "The content, subtitle or transcription of reference audio." \
--gen_text "Some text you want TTS model generate for you."
# Run with default setting. src/f5_tts/infer/examples/basic/basic.toml
f5-tts_infer-cli
# Or with your own .toml file
f5-tts_infer-cli -c custom.toml
# Multi voice. See src/f5_tts/infer/README.md
f5-tts_infer-cli -c src/f5_tts/infer/examples/multi/story.toml
Training
1. With Hugging Face Accelerate
Refer to training & finetuning guidance for best practice.
2. With Gradio App
# Quick start with Gradio web interface
f5-tts_finetune-gradio
Read training & finetuning guidance for more instructions.
Evaluation
Development
Use pre-commit to ensure code quality (will run linters and formatters automatically):
pip install pre-commit
pre-commit install
When making a pull request, before each commit, run:
pre-commit run --all-files
Note: Some model components have linting exceptions for E722 to accommodate tensor notation.
Acknowledgements
- E2-TTS brilliant work, simple and effective
- Emilia, WenetSpeech4TTS, LibriTTS, LJSpeech valuable datasets
- lucidrains initial CFM structure with also bfs18 for discussion
- SD3 & Hugging Face diffusers DiT and MMDiT code structure
- torchdiffeq as ODE solver, Vocos and BigVGAN as vocoder
- FunASR, faster-whisper, UniSpeech, SpeechMOS for evaluation tools
- ctc-forced-aligner for speech edit test
- mrfakename huggingface space demo ~
- f5-tts-mlx Implementation with MLX framework by Lucas Newman
- F5-TTS-ONNX ONNX Runtime version by DakeQQ
- Yuekai Zhang Triton and TensorRT-LLM support ~
Citation
If our work and codebase is useful for you, please cite as:
@article{chen-etal-2024-f5tts,
title={F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching},
author={Yushen Chen and Zhikang Niu and Ziyang Ma and Keqi Deng and Chunhui Wang and Jian Zhao and Kai Yu and Xie Chen},
journal={arXiv preprint arXiv:2410.06885},
year={2024},
}
License
Our code is released under MIT License. The pre-trained models are licensed under the CC-BY-NC license due to the training data Emilia, which is an in-the-wild dataset. Sorry for any inconvenience this may cause.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file f5_tts-1.1.18.tar.gz.
File metadata
- Download URL: f5_tts-1.1.18.tar.gz
- Upload date:
- Size: 1.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
007e399157ad4f9ce6c84b96889875e23e6bf938534b98c4b13baccb7c01b862
|
|
| MD5 |
787cb0659db226c9fb09e4f879c85f0e
|
|
| BLAKE2b-256 |
0dc77172f02094d534adb132bc2c5d099f0b2d2f95bf7e7aebd62c9e06cd3343
|
Provenance
The following attestation bundles were made for f5_tts-1.1.18.tar.gz:
Publisher:
publish-pypi.yaml on SWivid/F5-TTS
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
f5_tts-1.1.18.tar.gz -
Subject digest:
007e399157ad4f9ce6c84b96889875e23e6bf938534b98c4b13baccb7c01b862 - Sigstore transparency entry: 1172229400
- Sigstore integration time:
-
Permalink:
SWivid/F5-TTS@82fc4fe622fe36047d1dff99b550e6018181ea11 -
Branch / Tag:
refs/tags/1.1.18 - Owner: https://github.com/SWivid
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yaml@82fc4fe622fe36047d1dff99b550e6018181ea11 -
Trigger Event:
release
-
Statement type:
File details
Details for the file f5_tts-1.1.18-py3-none-any.whl.
File metadata
- Download URL: f5_tts-1.1.18-py3-none-any.whl
- Upload date:
- Size: 1.3 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b18f5998ef8dc42532a70e3dec6b0550d256727570484c5926631d3f2397c175
|
|
| MD5 |
e96834b7c51ad5df5c3a90fb0d8b8ed5
|
|
| BLAKE2b-256 |
c4e7413dec4ff057bfb2d6f39932db234137395fe4a17ec8c9d8f7c99e1b4f9d
|
Provenance
The following attestation bundles were made for f5_tts-1.1.18-py3-none-any.whl:
Publisher:
publish-pypi.yaml on SWivid/F5-TTS
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
f5_tts-1.1.18-py3-none-any.whl -
Subject digest:
b18f5998ef8dc42532a70e3dec6b0550d256727570484c5926631d3f2397c175 - Sigstore transparency entry: 1172229407
- Sigstore integration time:
-
Permalink:
SWivid/F5-TTS@82fc4fe622fe36047d1dff99b550e6018181ea11 -
Branch / Tag:
refs/tags/1.1.18 - Owner: https://github.com/SWivid
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yaml@82fc4fe622fe36047d1dff99b550e6018181ea11 -
Trigger Event:
release
-
Statement type: