Skip to main content

Lightweight streaming text-to-speech with Kokoro engine

Project description

streaming-tts.jpg


streaming-tts

Lightweight streaming text-to-speech with Kokoro engine.

Features

  • Streaming TTS: Real-time audio synthesis with callback support
  • Voice Blending: Mix multiple voices with weighted formulas
  • Pause Tags: Insert natural pauses with [pause:1.5s] syntax
  • Text Normalization: Convert URLs, emails, numbers, money to spoken form
  • Smart Chunking: Token-aware text splitting for optimal quality
  • Multi-Format Output: Export to WAV, MP3, Opus, FLAC, AAC (with [audio] extra)
  • 54 Voices: American, British English + 7 other languages

Installation

pip install streaming-tts

For development installation:

pip install -e .

Optional Extras

# Japanese language support
pip install streaming-tts[jp]

# Chinese language support
pip install streaming-tts[zh]

# Korean language support
pip install streaming-tts[ko]

# All features
pip install streaming-tts[all]

Note: Non-English languages require espeak-ng to be installed on your system.

Quick Start

from streaming_tts import TextToAudioStream, KokoroEngine

# Initialize the engine
engine = KokoroEngine(voice="af_heart")

# Create stream and play
stream = TextToAudioStream(engine)
stream.feed("Hello, world! This is a test of streaming text to speech.").play()

Pause Tags

Insert natural pauses in your text:

text = "Hello! [pause:1s] How are you? [pause:500ms] I hope you're well."
stream.feed(text).play()

Text Normalization

Automatically convert special content to spoken form:

from streaming_tts import normalize_text, NormalizationOptions

options = NormalizationOptions(normalize=True)

# URLs, emails, numbers, money, etc.
text = "Visit https://example.com or email user@test.com. Price: $42.50"
normalized = normalize_text(text, options)
# -> "Visit https example dot com or email user at test dot com. Price: forty-two dollars and fifty cents"

Smart Chunking

Split long text into optimal chunks for TTS:

from streaming_tts import smart_split, process_text_with_pauses
import time

# Process text with pauses
for item in process_text_with_pauses(text, normalize=True):
    if isinstance(item, float):
        time.sleep(item)  # Pause
    else:
        stream.feed(item).play()  # Speak

Multi-Format Audio Export

from streaming_tts import StreamingAudioWriter

# Requires: pip install streaming-tts[audio]
writer = StreamingAudioWriter("mp3", sample_rate=24000)

for audio_chunk in audio_chunks:
    mp3_data = writer.write_chunk(audio_chunk)
    # Stream or save mp3_data

final_data = writer.write_chunk(finalize=True)
writer.close()

Usage with Callbacks

from streaming_tts import TextToAudioStream, KokoroEngine

def on_audio_chunk(chunk):
    # Process audio chunk (e.g., send over websocket)
    pass

def on_stream_stop():
    print("Audio stream finished")

engine = KokoroEngine(voice="af_heart")
stream = TextToAudioStream(engine, on_audio_stream_stop=on_stream_stop)
stream.feed("Hello world").play(muted=True, on_audio_chunk=on_audio_chunk)

Available Voices

American English (lang_code='a')

  • Female: af_heart, af_alloy, af_aoede, af_bella, af_jessica, af_kore, af_nicole, af_nova, af_river, af_sarah, af_sky
  • Male: am_adam, am_echo, am_eric, am_fenrir, am_liam, am_michael, am_onyx, am_puck, am_santa

British English (lang_code='b')

  • Female: bf_alice, bf_emma, bf_isabella, bf_lily
  • Male: bm_daniel, bm_fable, bm_george, bm_lewis

Other Languages

  • Japanese: jf_alpha, jf_gongitsune, jf_nezumi, jf_tebukuro, jm_kumo
  • Chinese: zf_xiaobei, zf_xiaoni, zf_xiaoxiao, zf_xiaoyi, zm_yunjian, zm_yunxi, zm_yunxia, zm_yunyang
  • Spanish: ef_dora, em_alex, em_santa
  • French: ff_siwis
  • Hindi: hf_alpha, hf_beta, hm_omega, hm_psi
  • Italian: if_sara, im_nicola
  • Portuguese: pf_dora, pm_alex, pm_santa

Voice Blending

You can blend multiple voices using weighted formulas:

engine = KokoroEngine(voice="0.3*af_sarah + 0.7*am_adam")

API Reference

KokoroEngine

KokoroEngine(
    voice="af_heart",        # Voice name or blend formula
    default_speed=1.0,       # Speech speed multiplier
    trim_silence=True,       # Remove silence from audio
    debug=False              # Enable debug output
)

TextToAudioStream

TextToAudioStream(
    engine,                      # KokoroEngine instance
    on_audio_stream_start=None,  # Callback when audio starts
    on_audio_stream_stop=None,   # Callback when audio stops
    on_audio_chunk=None,         # Callback for each audio chunk
    on_word=None,                # Callback for word timing
    muted=False                  # Disable speaker output
)

Requirements

  • Python 3.9-3.12
  • PyAudio (may require system dependencies)
  • Torch

Windows Note

PyAudio on Windows may require Visual C++ Build Tools. If you encounter issues:

pip install pipwin
pipwin install pyaudio

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

streaming_tts-0.3.4.tar.gz (67.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

streaming_tts-0.3.4-py3-none-any.whl (73.1 kB view details)

Uploaded Python 3

File details

Details for the file streaming_tts-0.3.4.tar.gz.

File metadata

  • Download URL: streaming_tts-0.3.4.tar.gz
  • Upload date:
  • Size: 67.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for streaming_tts-0.3.4.tar.gz
Algorithm Hash digest
SHA256 59ad94b229208f82b8b234b46530e62abbcb41b409884dc93b55eb9014dc2391
MD5 890fa96a1a201efa7d2f392c04de38f0
BLAKE2b-256 f84fa0705b57e97dd38126b400d1f2496e42340a875edca58ae6978924c52e8c

See more details on using hashes here.

File details

Details for the file streaming_tts-0.3.4-py3-none-any.whl.

File metadata

  • Download URL: streaming_tts-0.3.4-py3-none-any.whl
  • Upload date:
  • Size: 73.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for streaming_tts-0.3.4-py3-none-any.whl
Algorithm Hash digest
SHA256 abf3d57b54fdbd188bc7fee1d5db5e73b1b3ffbee3019a104162f5747b70f2ab
MD5 19448ac29a074531c9dc789c24884fe0
BLAKE2b-256 d1804adfb2bb4e5e5be4ddbcf9926158d688a000030aef71f978614397a0f1d6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page