Skip to main content

Async-first TTS (Text-to-Speech) wrapper library for Python

Project description

SpeechFlow

A unified async-first Python TTS (Text-to-Speech) library with multiple engine support.

Features

  • Multiple TTS Engines: OpenAI, Google Gemini, FishAudio, Kokoro (local), Style-Bert-VITS2 (local)
  • Async-First Design: Native async/await API with sync wrappers for convenience
  • Streaming Support: Real-time audio streaming for supported engines
  • Decoupled Architecture: Engines, player, and writer are independent components
  • Optional Dependencies: Core requires only numpy; each engine is installable as an extra

Installation

# Core only (no engines)
uv add speechflow

# Install with specific engine
uv add "speechflow[openai]"

# Install with audio playback
uv add "speechflow[openai,player]"

# Install everything
uv add "speechflow[all]"

Available Extras

Extra Engine Type
openai OpenAI TTS Cloud
gemini Google Gemini TTS Cloud
fishaudio FishAudio TTS Cloud
kokoro Kokoro TTS (includes PyTorch) Local
stylebert Style-Bert-VITS2 (includes PyTorch) Local
player Audio playback via sounddevice Utility
all All of the above -
Using pip instead of uv
pip install "speechflow[openai]"
pip install "speechflow[openai,player]"
pip install "speechflow[all]"

GPU Support (Kokoro / Style-Bert-VITS2)

Local engines pull PyTorch as a dependency. By default, CPU-only PyTorch is installed. For GPU acceleration, install PyTorch with CUDA before installing speechflow:

# uv
uv add torch torchvision torchaudio --index https://download.pytorch.org/whl/cu121
uv add "speechflow[kokoro]"

# pip
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
pip install "speechflow[kokoro]"

Replace cu121 with your CUDA version (e.g., cu118, cu124).

Quick Start

Async (Primary API)

import asyncio
from speechflow import OpenAITTSEngine, AudioPlayer, AudioWriter

async def main():
    engine = OpenAITTSEngine(api_key="your-api-key")
    player = AudioPlayer()
    writer = AudioWriter()

    # Generate audio
    audio = await engine.get("Hello, world!")

    # Play audio
    await player.play(audio)

    # Save to file
    await writer.save(audio, "output.wav")

asyncio.run(main())

Sync Wrappers

from speechflow import OpenAITTSEngine, AudioPlayer, AudioWriter

engine = OpenAITTSEngine(api_key="your-api-key")
player = AudioPlayer()
writer = AudioWriter()

audio = engine.get_sync("Hello, world!")
player.play_sync(audio)
writer.save_sync(audio, "output.wav")

Streaming

import asyncio
from speechflow import OpenAITTSEngine, AudioPlayer

async def main():
    engine = OpenAITTSEngine(api_key="your-api-key")
    player = AudioPlayer()

    # Stream and play (returns combined AudioData)
    combined = await player.play_stream(engine.stream("This is a long text that will be streamed..."))

asyncio.run(main())

Streaming notes:

  • OpenAI: True streaming with multiple chunks.
  • Gemini: Returns complete audio in a single chunk (API limitation).
  • FishAudio: True streaming.
  • Kokoro / Style-Bert-VITS2: Sentence-by-sentence streaming.

Engine-Specific Features

OpenAI TTS

engine = OpenAITTSEngine(api_key="your-api-key")
audio = await engine.get(
    "Hello",
    voice="alloy",           # ash, ballad, coral, echo, fable, nova, onyx, sage, shimmer
    model="gpt-4o-mini-tts", # tts-1, tts-1-hd
    speed=1.0,
    instructions="Speak in a cheerful tone",
)

# Streaming
async for chunk in engine.stream("Long text..."):
    pass

Google Gemini TTS

engine = GeminiTTSEngine(api_key="your-api-key")
audio = await engine.get(
    "Hello",
    model="gemini-2.5-flash-preview-tts",  # gemini-2.5-pro-preview-tts
    voice="Leda",                           # Puck, Charon, Kore, Fenrir, Aoede, ...
)

FishAudio TTS

engine = FishAudioTTSEngine(api_key="your-api-key")
audio = await engine.get(
    "Hello world",
    model="s1",                  # s1-mini, speech-1.6, speech-1.5, agent-x0
    voice="your-voice-id",
    speed=1.0,                   # Speech speed
    volume=1.0,                  # Volume
)

# Streaming
async for chunk in engine.stream("Streaming text..."):
    pass

Kokoro TTS

# Default: American English
engine = KokoroTTSEngine()
audio = await engine.get(
    "Hello world",
    voice="af_heart",
    speed=1.0,
)

# Japanese (dictionary auto-downloads on first use)
engine = KokoroTTSEngine(lang_code="j")
audio = await engine.get("こんにちは、世界", voice="af_heart")

If Japanese dictionary download fails, run manually: python -m unidic download

Supported languages: American English (a), British English (b), Spanish (e), French (f), Hindi (h), Italian (i), Japanese (j), Brazilian Portuguese (p), Mandarin Chinese (z)

Style-Bert-VITS2

# Pre-trained model (auto-downloads on first use)
engine = StyleBertTTSEngine(model_name="jvnv-F1-jp")
audio = await engine.get(
    "こんにちは、世界",
    style="Happy",       # Neutral, Happy, Sad, Angry, Fear, Surprise, Disgust
    style_weight=5.0,    # Emotion strength (0.0-10.0)
    speed=1.0,
    pitch=0.0,           # Pitch shift in semitones
    speaker_id=0,
)

# Custom model
engine = StyleBertTTSEngine(model_path="/path/to/your/model")

# Sentence-by-sentence streaming
async for chunk in engine.stream("長い文章を文ごとに生成します。"):
    pass

Pre-trained models: jvnv-F1-jp, jvnv-F2-jp (female), jvnv-M1-jp, jvnv-M2-jp (male)

Optimized for Japanese. GPU recommended for best performance.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

speechflow-0.3.4.tar.gz (34.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

speechflow-0.3.4-py3-none-any.whl (31.1 kB view details)

Uploaded Python 3

File details

Details for the file speechflow-0.3.4.tar.gz.

File metadata

  • Download URL: speechflow-0.3.4.tar.gz
  • Upload date:
  • Size: 34.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for speechflow-0.3.4.tar.gz
Algorithm Hash digest
SHA256 a4e88f0021cdec80156d003d0059c49e71936019c51cd845c4c7e18fc8901732
MD5 5f42478cfcc4a358624d17f72990f147
BLAKE2b-256 c8720be86d737aeba42c56b5c020aa624b6c51ed8a7c2387a0bda50261cc18ad

See more details on using hashes here.

Provenance

The following attestation bundles were made for speechflow-0.3.4.tar.gz:

Publisher: publish.yml on sync-dev-org/speechflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file speechflow-0.3.4-py3-none-any.whl.

File metadata

  • Download URL: speechflow-0.3.4-py3-none-any.whl
  • Upload date:
  • Size: 31.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for speechflow-0.3.4-py3-none-any.whl
Algorithm Hash digest
SHA256 91daa34604ee3b5b329df437731f2799e2adfe864037e504b5c7b1cc42e02469
MD5 5e431a1c80e85642f04f2c5e3a40b866
BLAKE2b-256 68a09b478f5988bc2c92cfc788bc61fa8a8613783359169a09411e9bae134991

See more details on using hashes here.

Provenance

The following attestation bundles were made for speechflow-0.3.4-py3-none-any.whl:

Publisher: publish.yml on sync-dev-org/speechflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page