Async-first TTS (Text-to-Speech) wrapper library for Python

These details have not been verified by PyPI

Project description

SpeechFlow

A unified async-first Python TTS (Text-to-Speech) library with multiple engine support.

Features

Multiple TTS Engines: OpenAI, Google Gemini, FishAudio, Kokoro (local), Style-Bert-VITS2 (local)
Async-First Design: Native async/await API with sync wrappers for convenience
Streaming Support: Real-time audio streaming for supported engines
Decoupled Architecture: Engines, player, and writer are independent components
Optional Dependencies: Core requires only numpy; each engine is installable as an extra

Installation

# Core only (no engines)
uv add speechflow

# Install with specific engine
uv add "speechflow[openai]"

# Install with audio playback
uv add "speechflow[openai,player]"

# Install everything
uv add "speechflow[all]"

Available Extras

Extra	Engine	Type
`openai`	OpenAI TTS	Cloud
`gemini`	Google Gemini TTS	Cloud
`fishaudio`	FishAudio TTS	Cloud
`kokoro`	Kokoro TTS (includes PyTorch)	Local
`stylebert`	Style-Bert-VITS2 (includes PyTorch)	Local
`player`	Audio playback via sounddevice	Utility
`all`	All of the above	-

Using pip instead of uv

pip install "speechflow[openai]"
pip install "speechflow[openai,player]"
pip install "speechflow[all]"

GPU Support (Kokoro / Style-Bert-VITS2)

Local engines pull PyTorch as a dependency. By default, CPU-only PyTorch is installed. For GPU acceleration, install PyTorch with CUDA before installing speechflow:

# uv
uv add torch torchvision torchaudio --index https://download.pytorch.org/whl/cu121
uv add "speechflow[kokoro]"

# pip
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
pip install "speechflow[kokoro]"

Replace cu121 with your CUDA version (e.g., cu118, cu124).

Quick Start

Async (Primary API)

import asyncio
from speechflow import OpenAITTSEngine, AudioPlayer, AudioWriter

async def main():
    engine = OpenAITTSEngine(api_key="your-api-key")
    player = AudioPlayer()
    writer = AudioWriter()

    # Generate audio
    audio = await engine.get("Hello, world!")

    # Play audio
    await player.play(audio)

    # Save to file
    await writer.save(audio, "output.wav")

asyncio.run(main())

Sync Wrappers

from speechflow import OpenAITTSEngine, AudioPlayer, AudioWriter

engine = OpenAITTSEngine(api_key="your-api-key")
player = AudioPlayer()
writer = AudioWriter()

audio = engine.get_sync("Hello, world!")
player.play_sync(audio)
writer.save_sync(audio, "output.wav")

Streaming

import asyncio
from speechflow import OpenAITTSEngine, AudioPlayer

async def main():
    engine = OpenAITTSEngine(api_key="your-api-key")
    player = AudioPlayer()

    # Stream and play (returns combined AudioData)
    combined = await player.play_stream(engine.stream("This is a long text that will be streamed..."))

asyncio.run(main())

Streaming notes:

OpenAI: True streaming with multiple chunks.
Gemini: Returns complete audio in a single chunk (API limitation).
FishAudio: True streaming.
Kokoro / Style-Bert-VITS2: Sentence-by-sentence streaming.

Engine-Specific Features

OpenAI TTS

engine = OpenAITTSEngine(api_key="your-api-key")
audio = await engine.get(
    "Hello",
    voice="alloy",           # ash, ballad, coral, echo, fable, nova, onyx, sage, shimmer
    model="gpt-4o-mini-tts", # tts-1, tts-1-hd
    speed=1.0,
    instructions="Speak in a cheerful tone",
)

# Streaming
async for chunk in engine.stream("Long text..."):
    pass

Google Gemini TTS

engine = GeminiTTSEngine(api_key="your-api-key")
audio = await engine.get(
    "Hello",
    model="gemini-2.5-flash-preview-tts",  # gemini-2.5-pro-preview-tts
    voice="Leda",                           # Puck, Charon, Kore, Fenrir, Aoede, ...
)

FishAudio TTS

engine = FishAudioTTSEngine(api_key="your-api-key")
audio = await engine.get(
    "Hello world",
    model="s1",                  # s1-mini, speech-1.6, speech-1.5, agent-x0
    voice="your-voice-id",
    speed=1.0,                   # Speech speed
    volume=1.0,                  # Volume
)

# Streaming
async for chunk in engine.stream("Streaming text..."):
    pass

Kokoro TTS

# Default: American English
engine = KokoroTTSEngine()
audio = await engine.get(
    "Hello world",
    voice="af_heart",
    speed=1.0,
)

# Japanese (dictionary auto-downloads on first use)
engine = KokoroTTSEngine(lang_code="j")
audio = await engine.get("こんにちは、世界", voice="af_heart")

If Japanese dictionary download fails, run manually: python -m unidic download

Supported languages: American English (a), British English (b), Spanish (e), French (f), Hindi (h), Italian (i), Japanese (j), Brazilian Portuguese (p), Mandarin Chinese (z)

Style-Bert-VITS2

# Pre-trained model (auto-downloads on first use)
engine = StyleBertTTSEngine(model_name="jvnv-F1-jp")
audio = await engine.get(
    "こんにちは、世界",
    style="Happy",       # Neutral, Happy, Sad, Angry, Fear, Surprise, Disgust
    style_weight=5.0,    # Emotion strength (0.0-10.0)
    speed=1.0,
    pitch=0.0,           # Pitch shift in semitones
    speaker_id=0,
)

# Custom model
engine = StyleBertTTSEngine(model_path="/path/to/your/model")

# Sentence-by-sentence streaming
async for chunk in engine.stream("長い文章を文ごとに生成します。"):
    pass

Pre-trained models: jvnv-F1-jp, jvnv-F2-jp (female), jvnv-M1-jp, jvnv-M2-jp (male)

Optimized for Japanese. GPU recommended for best performance.

License

MIT

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.4.0

Mar 3, 2026

This version

0.3.4

Mar 2, 2026

0.3.3

Mar 2, 2026

0.1.7

Sep 8, 2025

0.1.6

Aug 9, 2025

0.1.5

Aug 7, 2025

0.1.4

Aug 7, 2025

0.1.3

Aug 7, 2025

0.1.0

Aug 7, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

speechflow-0.3.4.tar.gz (34.9 kB view details)

Uploaded Mar 2, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

speechflow-0.3.4-py3-none-any.whl (31.1 kB view details)

Uploaded Mar 2, 2026 Python 3

File details

Details for the file speechflow-0.3.4.tar.gz.

File metadata

Download URL: speechflow-0.3.4.tar.gz
Upload date: Mar 2, 2026
Size: 34.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for speechflow-0.3.4.tar.gz
Algorithm	Hash digest
SHA256	`a4e88f0021cdec80156d003d0059c49e71936019c51cd845c4c7e18fc8901732`
MD5	`5f42478cfcc4a358624d17f72990f147`
BLAKE2b-256	`c8720be86d737aeba42c56b5c020aa624b6c51ed8a7c2387a0bda50261cc18ad`

See more details on using hashes here.

Provenance

The following attestation bundles were made for speechflow-0.3.4.tar.gz:

Publisher: publish.yml on sync-dev-org/speechflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: speechflow-0.3.4.tar.gz
- Subject digest: a4e88f0021cdec80156d003d0059c49e71936019c51cd845c4c7e18fc8901732
- Sigstore transparency entry: 1008416806
- Sigstore integration time: Mar 2, 2026
Source repository:
- Permalink: sync-dev-org/speechflow@dcb0e218db932f5cc7eb119fe27f2fe37644484e
- Branch / Tag: refs/tags/v0.3.4
- Owner: https://github.com/sync-dev-org
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@dcb0e218db932f5cc7eb119fe27f2fe37644484e
- Trigger Event: push

File details

Details for the file speechflow-0.3.4-py3-none-any.whl.

File metadata

Download URL: speechflow-0.3.4-py3-none-any.whl
Upload date: Mar 2, 2026
Size: 31.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for speechflow-0.3.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`91daa34604ee3b5b329df437731f2799e2adfe864037e504b5c7b1cc42e02469`
MD5	`5e431a1c80e85642f04f2c5e3a40b866`
BLAKE2b-256	`68a09b478f5988bc2c92cfc788bc61fa8a8613783359169a09411e9bae134991`

See more details on using hashes here.

Provenance

The following attestation bundles were made for speechflow-0.3.4-py3-none-any.whl:

Publisher: publish.yml on sync-dev-org/speechflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: speechflow-0.3.4-py3-none-any.whl
- Subject digest: 91daa34604ee3b5b329df437731f2799e2adfe864037e504b5c7b1cc42e02469
- Sigstore transparency entry: 1008416808
- Sigstore integration time: Mar 2, 2026
Source repository:
- Permalink: sync-dev-org/speechflow@dcb0e218db932f5cc7eb119fe27f2fe37644484e
- Branch / Tag: refs/tags/v0.3.4
- Owner: https://github.com/sync-dev-org
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@dcb0e218db932f5cc7eb119fe27f2fe37644484e
- Trigger Event: push

speechflow 0.3.4

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

SpeechFlow

Features

Installation

Available Extras

GPU Support (Kokoro / Style-Bert-VITS2)

Quick Start

Async (Primary API)

Sync Wrappers

Streaming

Engine-Specific Features

OpenAI TTS

Google Gemini TTS

FishAudio TTS

Kokoro TTS

Style-Bert-VITS2

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance