Async-first TTS (Text-to-Speech) wrapper library for Python
Project description
SpeechFlow
A unified async-first Python TTS (Text-to-Speech) library with multiple engine support.
Features
- Multiple TTS Engines: OpenAI, Google Gemini, FishAudio, Kokoro (local), Style-Bert-VITS2 (local)
- Async-First Design: Native async/await API with sync wrappers for convenience
- Streaming Support: Real-time audio streaming for supported engines
- Decoupled Architecture: Engines, player, and writer are independent components
- Optional Dependencies: Core requires only numpy; each engine is installable as an extra
Installation
# Core only (no engines)
uv add speechflow
# Install with specific engine
uv add "speechflow[openai]"
# Install with audio playback
uv add "speechflow[openai,player]"
# Install everything
uv add "speechflow[all]"
Available Extras
| Extra | Engine | Type |
|---|---|---|
openai |
OpenAI TTS | Cloud |
gemini |
Google Gemini TTS | Cloud |
fishaudio |
FishAudio TTS | Cloud |
kokoro |
Kokoro TTS (includes PyTorch) | Local |
stylebert |
Style-Bert-VITS2 (includes PyTorch) | Local |
player |
Audio playback via sounddevice | Utility |
all |
All of the above | - |
Using pip instead of uv
pip install "speechflow[openai]"
pip install "speechflow[openai,player]"
pip install "speechflow[all]"
GPU Support (Kokoro / Style-Bert-VITS2)
Local engines pull PyTorch as a dependency. By default, CPU-only PyTorch is installed. For GPU acceleration, install PyTorch with CUDA before installing speechflow:
# uv
uv add torch torchvision torchaudio --index https://download.pytorch.org/whl/cu121
uv add "speechflow[kokoro]"
# pip
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
pip install "speechflow[kokoro]"
Replace cu121 with your CUDA version (e.g., cu118, cu124).
Quick Start
Async (Primary API)
import asyncio
from speechflow import OpenAITTSEngine, AudioPlayer, AudioWriter
async def main():
engine = OpenAITTSEngine(api_key="your-api-key")
player = AudioPlayer()
writer = AudioWriter()
# Generate audio
audio = await engine.get("Hello, world!")
# Play audio
await player.play(audio)
# Save to file
await writer.save(audio, "output.wav")
asyncio.run(main())
Sync Wrappers
from speechflow import OpenAITTSEngine, AudioPlayer, AudioWriter
engine = OpenAITTSEngine(api_key="your-api-key")
player = AudioPlayer()
writer = AudioWriter()
audio = engine.get_sync("Hello, world!")
player.play_sync(audio)
writer.save_sync(audio, "output.wav")
Streaming
import asyncio
from speechflow import OpenAITTSEngine, AudioPlayer
async def main():
engine = OpenAITTSEngine(api_key="your-api-key")
player = AudioPlayer()
# Stream and play (returns combined AudioData)
combined = await player.play_stream(engine.stream("This is a long text that will be streamed..."))
asyncio.run(main())
Streaming notes:
- OpenAI: True streaming with multiple chunks.
- Gemini: Returns complete audio in a single chunk (API limitation).
- FishAudio: True streaming.
- Kokoro / Style-Bert-VITS2: Sentence-by-sentence streaming.
Engine-Specific Features
OpenAI TTS
engine = OpenAITTSEngine(api_key="your-api-key")
audio = await engine.get(
"Hello",
voice="alloy", # ash, ballad, coral, echo, fable, nova, onyx, sage, shimmer
model="gpt-4o-mini-tts", # tts-1, tts-1-hd
speed=1.0,
instructions="Speak in a cheerful tone",
)
# Streaming
async for chunk in engine.stream("Long text..."):
pass
Google Gemini TTS
engine = GeminiTTSEngine(api_key="your-api-key")
audio = await engine.get(
"Hello",
model="gemini-2.5-flash-preview-tts", # gemini-2.5-pro-preview-tts
voice="Leda", # Puck, Charon, Kore, Fenrir, Aoede, ...
)
FishAudio TTS
engine = FishAudioTTSEngine(api_key="your-api-key")
audio = await engine.get(
"Hello world",
model="s1", # s1-mini, speech-1.6, speech-1.5, agent-x0
voice="your-voice-id",
speed=1.0, # Speech speed
volume=1.0, # Volume
)
# Streaming
async for chunk in engine.stream("Streaming text..."):
pass
Kokoro TTS
# Default: American English
engine = KokoroTTSEngine()
audio = await engine.get(
"Hello world",
voice="af_heart",
speed=1.0,
)
# Japanese (dictionary auto-downloads on first use)
engine = KokoroTTSEngine(lang_code="j")
audio = await engine.get("こんにちは、世界", voice="af_heart")
If Japanese dictionary download fails, run manually: python -m unidic download
Supported languages: American English (a), British English (b), Spanish (e), French (f), Hindi (h), Italian (i), Japanese (j), Brazilian Portuguese (p), Mandarin Chinese (z)
Style-Bert-VITS2
# Pre-trained model (auto-downloads on first use)
engine = StyleBertTTSEngine(model_name="jvnv-F1-jp")
audio = await engine.get(
"こんにちは、世界",
style="Happy", # Neutral, Happy, Sad, Angry, Fear, Surprise, Disgust
style_weight=5.0, # Emotion strength (0.0-10.0)
speed=1.0,
pitch=0.0, # Pitch shift in semitones
speaker_id=0,
)
# Custom model
engine = StyleBertTTSEngine(model_path="/path/to/your/model")
# Sentence-by-sentence streaming
async for chunk in engine.stream("長い文章を文ごとに生成します。"):
pass
Pre-trained models: jvnv-F1-jp, jvnv-F2-jp (female), jvnv-M1-jp, jvnv-M2-jp (male)
Optimized for Japanese. GPU recommended for best performance.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file speechflow-0.3.4.tar.gz.
File metadata
- Download URL: speechflow-0.3.4.tar.gz
- Upload date:
- Size: 34.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a4e88f0021cdec80156d003d0059c49e71936019c51cd845c4c7e18fc8901732
|
|
| MD5 |
5f42478cfcc4a358624d17f72990f147
|
|
| BLAKE2b-256 |
c8720be86d737aeba42c56b5c020aa624b6c51ed8a7c2387a0bda50261cc18ad
|
Provenance
The following attestation bundles were made for speechflow-0.3.4.tar.gz:
Publisher:
publish.yml on sync-dev-org/speechflow
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
speechflow-0.3.4.tar.gz -
Subject digest:
a4e88f0021cdec80156d003d0059c49e71936019c51cd845c4c7e18fc8901732 - Sigstore transparency entry: 1008416806
- Sigstore integration time:
-
Permalink:
sync-dev-org/speechflow@dcb0e218db932f5cc7eb119fe27f2fe37644484e -
Branch / Tag:
refs/tags/v0.3.4 - Owner: https://github.com/sync-dev-org
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@dcb0e218db932f5cc7eb119fe27f2fe37644484e -
Trigger Event:
push
-
Statement type:
File details
Details for the file speechflow-0.3.4-py3-none-any.whl.
File metadata
- Download URL: speechflow-0.3.4-py3-none-any.whl
- Upload date:
- Size: 31.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
91daa34604ee3b5b329df437731f2799e2adfe864037e504b5c7b1cc42e02469
|
|
| MD5 |
5e431a1c80e85642f04f2c5e3a40b866
|
|
| BLAKE2b-256 |
68a09b478f5988bc2c92cfc788bc61fa8a8613783359169a09411e9bae134991
|
Provenance
The following attestation bundles were made for speechflow-0.3.4-py3-none-any.whl:
Publisher:
publish.yml on sync-dev-org/speechflow
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
speechflow-0.3.4-py3-none-any.whl -
Subject digest:
91daa34604ee3b5b329df437731f2799e2adfe864037e504b5c7b1cc42e02469 - Sigstore transparency entry: 1008416808
- Sigstore integration time:
-
Permalink:
sync-dev-org/speechflow@dcb0e218db932f5cc7eb119fe27f2fe37644484e -
Branch / Tag:
refs/tags/v0.3.4 - Owner: https://github.com/sync-dev-org
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@dcb0e218db932f5cc7eb119fe27f2fe37644484e -
Trigger Event:
push
-
Statement type: