FireRedTTS2 - speech generation utilities and model wrapper

Project description

FireRedTTS-2

Official PyTorch code for
FireRedTTS-2: Towards Long Conversational Speech Generation for Podcast and Chatbot

FireRedTTS_Logo

Overview

FireRedTTS‑2 is a long-form streaming TTS system for multi-speaker dialogue generation, delivering stable, natural speech with reliable speaker switching and context-aware prosody.

Highlight🔥

Long Conversational Speech Generation: It currently supports 3 minutes dialogues with 4 speakers and can be easily scaled to longer conversations with more speakers by extending training corpus.
Multilingual Support: It supports multiple languages including English, Chinese, Japanese, Korean, French, German, and Russian. Support zero-shot voice cloning for cross-lingual and code-switching scenarios.
Ultra-Low Latency: Building on the new 12.5Hz streaming speech tokenizer, we employ a dual-transformer architecture that operates on a text–speech interleaved sequence, enabling flexible sentence-bysentence generation and reducing first-packet latency，Specifically, on an L20 GPU, our first-packet latency as low as 140ms while maintaining high-quality audio output.
Strong Stability：Our model achieves high similarity and low WER/CER in both monologue and dialogue tests.
Random Timbre Generation:Useful for creating ASR/speech interaction data.

Demo Examples

Random Timbre Generation & Multilingual Support

https://github.com/user-attachments/assets/804e9e67-fb15-4557-9715-43cd46a1b3e8

Zero-Shot Podcast Generation

https://github.com/user-attachments/assets/e68b1b7e-1329-47bb-a16f-8589cf227579

Speaker-Specific Finetuned Podcast Generation

⚠️ Speaker voices: hosts "肥杰" and "惠子" from the podcast "肥话连篇". Use without authorization is forbidden.

⚠️ 声音来源：播客 "肥话连篇" 主播 "肥杰" 和 "惠子"，未经授权不能使用。

https://github.com/user-attachments/assets/21f626cb-eaf4-4f5c-920c-3d5d4c8cfa8b

For more examples, see demo page.

News

[2025/09/12] 🔥 We have added a UI tool to the dialogue generation.
[2025/09/08] 🔥 We release the pre-trained checkpoints and inference code.
[2025/09/02] 🔥 We release the technical report and demo page

Roadmap

2025/09
- Release the pre-trained checkpoints and inference code.
- Add web UI tool.
2025/10
- Release a base model with enhanced multilingual support.
- Provide fine-tuning code & tutorial for specific dialogue/multilingual data.
- End-to-end text-to-blog pipeline.

Install & Model Download

Clone and install

Clone the repo

git clone https://github.com/FireRedTeam/FireRedTTS2.git
cd FireRedTTS2

Create Conda env:

conda create --name fireredtts2 python==3.11
conda activate fireredtts2

# Step 1. PyTorch Installation (if required)
pip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 --index-url https://download.pytorch.org/whl/cu126

# Step 2. Install Dependencies
pip install -e .
pip install -r requirements.txt

Model download

git lfs install
git clone https://huggingface.co/FireRedTeam/FireRedTTS2 pretrained_models/FireRedTTS2

Basic Usage

Dialogue Generation with Web UI

Generate dialogue through an easy-to-use web interface that supports both voice cloning and randomized voices.

python gradio_demo.py --pretrained-dir "./pretrained_models/FireRedTTS2"

FireRedTTS_Logo

Dialogue Generation

import os
import sys
import torch
import torchaudio
from fireredtts2.fireredtts2 import FireRedTTS2

device = "cuda"

fireredtts2 = FireRedTTS2(
    pretrained_dir="./pretrained_models/FireRedTTS2",
    gen_type="dialogue",
    device=device,
)

text_list = [
    "[S1]那可能说对对，没有去过美国来说去去看到美国线下。巴斯曼也好，沃尔玛也好，他们线下不管说，因为深圳出去的还是电子周边的会表达，会发现哇对这个价格真的是很高呀。都是卖三十五美金、四十美金，甚至一个手机壳，就是二十五美金开。",
    "[S2]对，没错，我每次都觉得不不可思议。我什么人会买三五十美金的手机壳？但是其实在在那个target啊，就塔吉特这种超级市场，大家都是这样的，定价也很多人买。",
    "[S1]对对，那这样我们再去看说亚马逊上面卖卖卖手机壳也好啊，贴膜也好，还包括说车窗也好，各种线材也好，大概就是七块九九或者说啊八块九九，这个价格才是卖的最多的啊。因为亚马逊的游戏规则限定的。如果说你卖七块九九以下，那你基本上是不赚钱的。",
    "[S2]那比如说呃除了这个可能去到海外这个调查，然后这个调研考察那肯定是最直接的了。那平时我知道你是刚才建立了一个这个叫做呃rean的这样的一个一个播客，它是一个英文的。然后平时你还听一些什么样的东西，或者是从哪里获取一些这个海外市场的一些信息呢？",
    "[S1]嗯，因为做做亚马逊的话呢，我们会关注很多行业内的东西。就比如说行业有什么样亚马逊有什么样新的游戏规则呀。呃，物流的价格有没有波动呀，包括说有没有什么新的评论的政策呀，广告有什么新的打法呀？那这些我们会会关关注很多行业内部的微信公众号呀，还包括去去查一些知乎专栏的文章呀，以及说我们周边有很多同行。那我们经常会坐在一起聊天，看看信息有什么共享。那这个是关注内内的一个方式。",
]
prompt_wav_list = [
    "examples/chat_prompt/zh/S1.flac",
    "examples/chat_prompt/zh/S2.flac",
]

prompt_text_list = [
    "[S1]啊，可能说更适合美国市场应该是什么样子。那这这个可能说当然如果说有有机会能亲身的去考察去了解一下，那当然是有更好的帮助。",
    "[S2]比如具体一点的，他觉得最大的一个跟他预想的不一样的是在什么地方。",
]

all_audio = fireredtts2.generate_dialogue(
    text_list=text_list,
    prompt_wav_list=prompt_wav_list,
    prompt_text_list=prompt_text_list,
    temperature=0.9,
    topk=30,
)
torchaudio.save("chat_clone.wav", all_audio, 24000)

Monologue Generation

import os
import sys
import torch
import torchaudio
from fireredtts2.fireredtts2 import FireRedTTS2

device = "cuda"
lines = [
    "Hello everyone, welcome to our newly launched FireRedTTS2. It supports multiple languages including English, Chinese, Japanese, Korean, French, German, and Russian. Additionally, this TTS model features long-context dialogue generation capabilities.",
    "如果你厌倦了千篇一律的AI音色，不满意于其他模型语言支持不够丰富，那么本项目将会成为你绝佳的工具。",
    "ランダムな話者と言語を選択して合成できます",
    "이는 많은 인공지능 시스템에 유용합니다. 예를 들어, 제가 다양한 음성 데이터를 대량으로 생성해 여러분의 ASR 모델이나 대화 모델에 풍부한 데이터를 제공할 수 있습니다.",
    "J'évolue constamment et j'espère pouvoir parler davantage de langues avec plus d'aisance à l'avenir.",
]

fireredtts2 = FireRedTTS2(
    pretrained_dir="./pretrained_models/FireRedTTS2",
    gen_type="monologue",
    device=device,
)

# random speaker
for i in range(len(lines)):
    text = lines[i].strip()
    audio = fireredtts2.generate_monologue(text=text)
    # adjust temperature & topk
    # audio = fireredtts2.generate_monologue(text=text, temperature=0.8, topk=30)
    torchaudio.save(str(i) + ".wav", audio.cpu(), 24000)


# # voice clone
# for i in range(len(lines)):
#     text = lines[i].strip()

#     audio = fireredtts2.generate_monologue(
#         text=text,
#         prompt_wav=<prompt_wav_path>,
#         prompt_text=<prompt_wav_text>,
#     )
#     torchaudio.save(str(i) + ".wav", audio.cpu(), 24000)

Acknowledgements

We thank Moshi and Sesame CSM for their novel dual-transformer approach. Additionally, we adapted Sesame CSM's structure and core inference code.
We referred to Qwen2.5-1.5B text tokenizer solution.
We referred to Xcodec2 Vocos-based acoustic decoder.

⚠️ Usage Disclaimer ❗️❗️❗️❗️❗️❗️

The project incorporates zero-shot voice cloning functionality; Please note that this capability is intended solely for academic research purposes.
DO NOT use this model for ANY illegal activities❗️❗️❗️❗️❗️❗️
The developers assume no liability for any misuse of this model.
If you identify any instances of abuse, misuse, or fraudulent activities related to this project, please report them to our team immediately.

Project details

Release history Release notifications | RSS feed

This version

0.1.0

Sep 27, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tts_webui_fireredtts2-0.1.0-py3-none-any.whl (40.0 kB view details)

Uploaded Sep 27, 2025 Python 3

File details

Details for the file tts_webui_fireredtts2-0.1.0-py3-none-any.whl.

File metadata

Download URL: tts_webui_fireredtts2-0.1.0-py3-none-any.whl
Upload date: Sep 27, 2025
Size: 40.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.18

File hashes

Hashes for tts_webui_fireredtts2-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d350067c75a9a7e32f9656a932eb263b18005af37c00cb28bb0edd293cf1a860`
MD5	`8a92547e51ff75ac07f658a30b2b29ab`
BLAKE2b-256	`bafd7f0e99a65db5c2adcc886dbf75d53117bb393b09e4281826b941e70db149`

See more details on using hashes here.

tts-webui.fireredtts2 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta