Skip to main content

Stream text into audio with an easy-to-use, highly configurable library delivering voice output with minimal latency.

Project description

To install realtimetts, you need to specify the TTS engine(s) you wish to use.

For example, to install all supported engines:

pip install realtimetts[all]

To install with the Coqui TTS engine:

pip install realtimetts[coqui]

Available engine options include:

  • all: Install all supported engines
  • system: Local system TTS via pyttsx3
  • azure: Azure Speech Services support
  • elevenlabs: ElevenLabs API integration
  • openai: OpenAI TTS services
  • gtts: Google Text-to-Speech
  • edge: Microsoft Edge TTS
  • coqui: Coqui TTS engine
  • camb: CAMB AI MARS TTS
  • minimax: MiniMax Cloud TTS
  • cartesia: Cartesia API integration
  • modelslab: ModelsLab API integration
  • orpheus: Orpheus TTS support
  • qwen: Faster Qwen3 TTS integration
  • omnivoice: Omnivoice TTS integration
  • luxtts: LuxTTS integration
  • chatterbox: Chatterbox Turbo integration
  • sopro: SoproTTS integration
  • soprano: SopranoTTS integration
  • neutts: NeuTTS integration
  • zipvoice: ZipVoice dependency support
  • moss: MOSS-TTS dependency support
  • pockettts: PocketTTS integration
  • parler: Parler TTS integration
  • styletts: StyleTTS integration
  • piper: Piper executable engine support
  • typecast: Typecast API integration
  • minimal: Core package only (for custom engine development)

You can install multiple engines by separating them with commas. For example:

pip install realtimetts[azure,elevenlabs,openai]

RealtimeTTS

PyPI Downloads GitHub release

RealtimeTTS is a Python text-to-speech library for applications that need to turn strings, generators, and LLM token streams into audio with low latency. It can play speech locally, stream chunks to another process, write WAV files, and fall back across multiple engines.

The project supports a broad engine matrix: local system voices, cloud APIs, free service wrappers, local neural models, and voice-cloning stacks.

Support RealtimeTTS

If RealtimeTTS saved you time, one GitHub star is a simple way to help make it more stable.

Stars improve visibility, and visibility brings more users, more real-world testing, more bug reports, more fixes, and better releases for everyone.

Demo

https://github.com/KoljaB/RealtimeTTS/assets/7604638/87dcd9a5-3a4e-4f57-be45-837fc63237e7

Install

For the fastest local smoke test, install the system engine:

pip install "realtimetts[system]"

On Linux, install PortAudio headers before installing PyAudio:

sudo apt-get update
sudo apt-get install python3-dev portaudio19-dev

On macOS:

brew install portaudio

For cloud engines, local neural engines, CUDA, mpv, and current packaging caveats, see docs/installation.md.

First Audio

from RealtimeTTS import TextToAudioStream, SystemEngine


if __name__ == "__main__":
    stream = TextToAudioStream(SystemEngine())
    stream.feed("Hello from RealtimeTTS.")
    stream.play()

Use the if __name__ == "__main__": guard in scripts, especially on Windows and when using engines that start worker processes.

Streaming Text

feed() accepts an iterator, so text can arrive while audio is already playing:

from RealtimeTTS import TextToAudioStream, SystemEngine


def text_chunks():
    yield "This starts speaking quickly. "
    yield "More text can arrive while audio is already playing."


if __name__ == "__main__":
    stream = TextToAudioStream(SystemEngine())
    stream.feed(text_chunks())
    stream.play()

Use the same pattern with an LLM client by yielding only non-empty text chunks. See docs/llm-streaming.md.

Output

Write audio to a WAV file without local speaker playback:

from RealtimeTTS import TextToAudioStream, SystemEngine


if __name__ == "__main__":
    stream = TextToAudioStream(SystemEngine())
    stream.feed("Save this speech to a file.")
    stream.play(output_wavfile="speech.wav", muted=True)

For output devices, mpv playback, muted mode, callbacks, and chunk formats, see docs/output-and-files.md.

Features

  • Low-latency playback from strings, generators, and streamed model output.
  • Multiple engines with local, cloud, free-service, and neural model options.
  • Fallback engines for more resilient synthesis.
  • Sync and async playback with pause, resume, stop, and state inspection.
  • Text, audio, sentence, character, word-timing, and audio-chunk callbacks.
  • WAV output, muted synthesis, selected output devices, and volume control.
  • Voice switching and voice-cloning workflows where supported by the engine.

Engine Overview

Engine Type Install/status note Best first use
SystemEngine Local realtimetts[system] First local audio smoke test.
GTTSEngine Free service realtimetts[gtts] Simple network-backed speech.
EdgeEngine Free service realtimetts[edge], needs mpv Free streamed voices.
OpenAIEngine Cloud API realtimetts[openai] OpenAI TTS voices.
AzureEngine Cloud API realtimetts[azure] Azure voices and word timings.
ElevenlabsEngine Cloud API realtimetts[elevenlabs], needs mpv High-quality API voices.
CambEngine Cloud API realtimetts[camb] CAMB MARS API voices.
MiniMaxEngine Cloud API realtimetts[minimax] MiniMax cloud voices.
CartesiaEngine Cloud API realtimetts[cartesia] Cartesia API voices.
TypecastEngine Cloud API realtimetts[typecast] Typecast API voices.
ModelsLabEngine Cloud API realtimetts[modelslab], root export pending ModelsLab API voices.
CoquiEngine Local neural realtimetts[coqui] Local XTTS voice cloning.
PiperEngine Local executable realtimetts[piper], external Piper setup Fast local executable TTS.
StyleTTSEngine Local neural realtimetts[styletts], local checkout/assets StyleTTS experiments.
ParlerEngine Local neural realtimetts[parler] GPU local model experiments.
KokoroEngine Local neural realtimetts[kokoro] Local voices and timing support.
OrpheusEngine Local/API-style realtimetts[orpheus] Orpheus model workflows.
FasterQwenEngine Local neural realtimetts[qwen] Qwen voice cloning.
OmniVoiceEngine Local neural realtimetts[omnivoice] Multilingual voice cloning.
PocketTTSEngine / PocketTTSGpuEngine Local lightweight realtimetts[pockettts], realtimetts[pockettts-gpu] plus GPU fork CPU-oriented voice cloning, optional CUDA fork path.
NeuTTSEngine Local neural realtimetts[neutts], optional neutts-gguf Reference-audio voice cloning.
ZipVoiceEngine Local neural realtimetts[zipvoice], external checkout ZipVoice cloning/server demos.
LuxTTSEngine Local neural realtimetts[luxtts] LuxTTS voice cloning.
ChatterboxEngine Local neural realtimetts[chatterbox] Chatterbox prompt-audio voices.
SoproTTSEngine Local neural realtimetts[sopro] Sopro reference-audio voices.
SopranoEngine Local neural realtimetts[soprano] Soprano local synthesis.
MossTTSEngine Local neural realtimetts[moss], runtime assets MOSS-TTS experiments.

See docs/engine-selection.md before choosing an engine for an application. The engine-specific docs are being split out from the old README and source audit.

Documentation

  • Quick start: shortest working examples.
  • Installation: extras, platform setup, external tools, API keys, and known packaging mismatches.
  • Engine selection: engine matrix and selection guidance.
  • Feed and playback: feed(), play(), play_async(), pause, resume, stop, text state, and inline tags.
  • LLM streaming: provider-neutral streamed text patterns and latency tuning.
  • Output and files: WAV files, audio chunks, muted mode, output devices, mpv, buffering, and volume.
  • Engine setup pages now link one focused page for each concrete engine source.
  • FAQ: legacy troubleshooting page while topic docs are being split out.

Legacy translated docs remain under docs/<locale>/ while English is refactored as the canonical source.

Server Example

The browser and WebSocket server example lives in example_fast_api/:

python -m pip install fastapi uvicorn websockets pyaudio
python example_fast_api/async_server.py

Open http://localhost:8000 or connect to ws://localhost:8000/ws.

Related Project

RealtimeSTT is the speech-to-text counterpart for realtime voice input.

Contributing

Focused docs, tests, and engine fixes are easiest to review. During the docs refactor, keep English docs canonical and note mismatches between source, packaging, examples, and tests rather than hiding them.

License

RealtimeTTS source code is MIT licensed. Engine providers, model weights, voice data, datasets, generated audio, and third-party services can have separate terms. Read LICENSING_ADDENDUM.md and the relevant provider or model licenses before commercial use.

Audio samples derived from the EARS dataset by Meta are licensed under CC BY-NC 4.0. See the original dataset terms for details.

Author

Kolja Beigel

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

realtimetts-0.7.3.tar.gz (559.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

realtimetts-0.7.3-py3-none-any.whl (574.8 kB view details)

Uploaded Python 3

File details

Details for the file realtimetts-0.7.3.tar.gz.

File metadata

  • Download URL: realtimetts-0.7.3.tar.gz
  • Upload date:
  • Size: 559.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for realtimetts-0.7.3.tar.gz
Algorithm Hash digest
SHA256 b7496f21b1c38825ec3dfe6e595b2d245c11c6ca9cc67e3fc9ceab853a718ce1
MD5 0d45dc5abb8cfb0b69c862275e185798
BLAKE2b-256 64ca35b14f7795ddc1f2d6d7cbee3161853875c8c04485018b26f8be8b4e2849

See more details on using hashes here.

File details

Details for the file realtimetts-0.7.3-py3-none-any.whl.

File metadata

  • Download URL: realtimetts-0.7.3-py3-none-any.whl
  • Upload date:
  • Size: 574.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for realtimetts-0.7.3-py3-none-any.whl
Algorithm Hash digest
SHA256 38b719af7eea98b1102ac8f31664ef00eaa6dfb7d32f876138c84d4f6367c76d
MD5 1006eac65b3ce54dd22467fb42022583
BLAKE2b-256 13bf6c2d6dfeb0cbcbbbcc247ff5bf527d3ddf55e4ee439d6aa462adbab832f1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page