TTS (Text-to-Speech) wrapper library for Python
Project description
SpeechFlow
A unified Python TTS (Text-to-Speech) library that provides a simple interface for multiple TTS engines.
Features
-
Multiple TTS Engine Support:
- OpenAI TTS
- Google Gemini TTS
- FishAudio TTS (Cloud-based, multi-voice)
- Kokoro TTS (Multi-language, lightweight, local)
- Style-Bert-VITS2 (Local, high-quality Japanese TTS)
-
Unified Interface: Switch between different TTS engines without changing your code
-
Streaming Support: Real-time audio streaming for supported engines
-
Decoupled Architecture: Use TTS engines, audio players, and file writers independently
-
Audio Playback: Synchronous audio player with streaming support
-
File Export: Save synthesized speech to various audio formats
Installation
pip install speechflow
For Style-Bert-VITS2 support:
# Make sure numba>=0.61 is installed first for Python 3.12 compatibility
pip install numba>=0.61
pip install style-bert-vits2>=2.5.0
Quick Start
Basic Usage (Decoupled Components)
from speechflow import OpenAITTSEngine, AudioPlayer, AudioWriter
# Initialize components
engine = OpenAITTSEngine(api_key="your-api-key")
player = AudioPlayer()
writer = AudioWriter()
# Generate audio
audio = engine.get("Hello, world!")
# Play audio
player.play(audio)
# Save to file
writer.save(audio, "output.wav")
Streaming Audio
Important Notes on Streaming Behavior:
- OpenAI: True streaming with multiple chunks. First call may have 10-20s cold start delay. Uses PCM format for simplicity.
- Gemini: Returns complete audio in a single chunk (as of January 2025). This is a known limitation, not true streaming.
from speechflow import OpenAITTSEngine, AudioPlayer, AudioWriter
# Initialize components
engine = OpenAITTSEngine(api_key="your-api-key")
player = AudioPlayer()
writer = AudioWriter()
# Warmup for OpenAI (recommended for production)
_ = list(engine.stream("Warmup"))
# Stream and play audio (returns combined AudioData)
combined_audio = player.play_stream(engine.stream("This is a long text that will be streamed..."))
# Save the combined audio to file
writer.save(combined_audio, "output.wav")
Engine-Specific Features
OpenAI TTS
from speechflow import OpenAITTSEngine
engine = OpenAITTSEngine(api_key="your-api-key")
audio = engine.get(
"Hello",
voice="alloy", # or: ash, ballad, coral, echo, fable, nova, onyx, sage, shimmer
model="gpt-4o-mini-tts", # or: tts-1, tts-1-hd
speed=1.0
)
# Streaming
for chunk in engine.stream("Long text..."):
# Process audio chunks in real-time
pass
Google Gemini TTS
from speechflow import GeminiTTSEngine
engine = GeminiTTSEngine(api_key="your-api-key")
audio = engine.get(
"Hello",
model="gemini-2.5-flash-preview-tts", # or: gemini-2.5-pro-preview-tts
voice="Leda", # or: Puck, Charon, Kore, Fenrir, Aoede, and many more
speed=1.0
)
FishAudio TTS
from speechflow import FishAudioTTSEngine
engine = FishAudioTTSEngine(api_key="your-api-key")
audio = engine.get(
"Hello world",
model="s1", # or: s1-mini, speech-1.6, speech-1.5, agent-x0
voice="your-voice-id" # Use your FishAudio voice ID
)
# Streaming
for chunk in engine.stream("Streaming text..."):
# Process audio chunks
pass
Kokoro TTS
from speechflow import KokoroTTSEngine
# Default: American English
engine = KokoroTTSEngine()
audio = engine.get(
"Hello world",
voice="af_heart" # Multiple voices available
)
# Japanese (requires additional setup)
engine = KokoroTTSEngine(lang_code="j")
audio = engine.get(
"こんにちは、世界",
voice="af_heart"
)
Note for Japanese support: The Japanese dictionary will be automatically downloaded on first use. If you encounter errors, you can manually download it:
python -m unidic download
Style-Bert-VITS2
from speechflow import StyleBertTTSEngine
# Use pre-trained model (automatically downloads on first use)
engine = StyleBertTTSEngine(model_name="jvnv-F1-jp") # Female Japanese voice
audio = engine.get(
"こんにちは、世界",
style="Happy", # Emotion: Neutral, Happy, Sad, Angry, Fear, Surprise, Disgust
style_weight=5.0, # Emotion strength (0.0-10.0)
speed=1.0, # Speech speed
pitch=0.0 # Pitch shift in semitones
)
# Available pre-trained models:
# - jvnv-F1-jp, jvnv-F2-jp: Female voices (JP-Extra version)
# - jvnv-M1-jp, jvnv-M2-jp: Male voices (JP-Extra version)
# - jvnv-F1, jvnv-F2, jvnv-M1, jvnv-M2: Legacy versions
# Use custom model
engine = StyleBertTTSEngine(model_path="/path/to/your/model")
# Sentence-by-sentence streaming (not true streaming)
for audio_chunk in engine.stream("長い文章を文ごとに生成します。"):
# Process each sentence's audio
pass
Note: Style-Bert-VITS2 is optimized for Japanese text and requires GPU for best performance.
Language Support
Kokoro Languages
- 🇺🇸 American English (
a) - 🇬🇧 British English (
b) - 🇪🇸 Spanish (
e) - 🇫🇷 French (
f) - 🇮🇳 Hindi (
h) - 🇮🇹 Italian (
i) - 🇯🇵 Japanese (
j) - requires unidic - 🇧🇷 Brazilian Portuguese (
p) - 🇨🇳 Mandarin Chinese (
z)
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file speechflow-0.1.0.tar.gz.
File metadata
- Download URL: speechflow-0.1.0.tar.gz
- Upload date:
- Size: 34.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9feb3cbcde86f89e66d57ed475086d3b5659c68878ecfcde59259e3994f2cba3
|
|
| MD5 |
686a7c4d4b6c82e43943d941a0941d53
|
|
| BLAKE2b-256 |
291d0081b77f955db8c455f3c6dfd041755adef71104d37467de2ccb26cf1f6c
|
File details
Details for the file speechflow-0.1.0-py3-none-any.whl.
File metadata
- Download URL: speechflow-0.1.0-py3-none-any.whl
- Upload date:
- Size: 23.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f7914cdd9aa70f31794c5ba06009df6daa1c777c4e28a6f26f71fe784b2044f1
|
|
| MD5 |
250afb1ec6124a4676f6deea9b6494e0
|
|
| BLAKE2b-256 |
28bde757c1ad5a2577c4dc1e654f1092aae83396878107f7241eafa409a17a0f
|