Skip to main content

A fast Voice Activity Detection and Transcription System

Project description

RealtimeSTT lets you choose the transcription and wake-word dependencies you want to install.

Recommended default local Whisper install:

pip install "realtimestt[recommended]"

Main ASR backend only, without the faster packaged Silero ONNX Runtime VAD:

pip install "realtimestt[faster-whisper]"

Core package only, without a transcription engine or wake-word backend:

pip install realtimestt

Install multiple extras by separating them with commas:

pip install "realtimestt[faster-whisper,porcupine]"
pip install "realtimestt[whisper-cpp,openwakeword]"

Available extras include:

  • faster-whisper: default CTranslate2 Whisper backend
  • whisper-cpp: whisper.cpp backend through pywhispercpp
  • openai-whisper: original OpenAI Whisper Python backend
  • sherpa-onnx: sherpa-onnx CPU backends
  • silero-vad: packaged Silero model assets and PyTorch wrapper
  • silero-onnx/silero-onnx-cpu: fastest Silero VAD CPU ONNX Runtime backend
  • silero-onnx-gpu: installs Silero's ONNX GPU runtime extra for experiments
  • parakeet: NVIDIA NeMo Parakeet backend
  • omnilingual/omnilingual-asr: Meta Omnilingual ASR backend for Linux/WSL2 with Python 3.11.x only; uses omnilingual-asr>=0.2.0 with matching torch/torchaudio builds
  • transformers: shared Transformers dependency for Moonshine, Granite, and Cohere
  • moonshine, granite, cohere: aliases for the Transformers dependency set
  • qwen: Qwen ASR backend
  • qwen-vllm: Qwen ASR with vLLM extras
  • kroko-builder: helper command for building/installing Kroko-ONNX plus Hugging Face model downloads
  • porcupine: Porcupine wake-word backend
  • openwakeword: OpenWakeWord wake-word backend
  • wakewords: both wake-word backends
  • recommended/default: faster-whisper backend plus fast Silero CPU ONNX VAD
  • all: all PyPI-installable optional backends

WebRTC VAD is installed with the core package. AudioToTextRecorder also initializes a Silero VAD path. Install the recommended/default or silero-onnx-cpu extra for a self-contained local Silero ONNX Runtime backend.

Meta Omnilingual ASR install note: use Linux or WSL2 with Python 3.11.x. Native Windows cannot run the Omnilingual runtime because fairseq2n has no Windows wheel, and Python 3.12.x currently cannot resolve omnilingual-asr>=0.2.0 from PyPI because the upstream package metadata excludes normal 3.12 patch releases.

For live Kroko-ONNX usage, install the builder helper and then build Kroko in the same Python environment:

pip install "realtimestt[kroko-builder,silero-onnx-cpu]"
stt-install-kroko --build

The silero-onnx-cpu extra is not needed to build Kroko-ONNX itself, but recorder-based Kroko smoke tests and live AudioToTextRecorder use need a local VAD backend.

On Windows, use Python 3.12 x64 and start Docker Desktop before running the builder. Check that Docker's Linux engine is available with:

python --version
git --version
docker version

docker version must show a Server section. docker --version only checks that the Docker CLI is installed.

If the default builder cache is not writable, use a project-local work directory:

stt-install-kroko --build --work-dir .\kroko-builder-work

The kroko-builder extra includes huggingface_hub. Download a public Community model after the builder finishes:

mkdir test-model-cache\kroko-onnx
python -c "from huggingface_hub import hf_hub_download; hf_hub_download(repo_id='Banafo/Kroko-ASR', filename='Kroko-EN-Community-64-L-Streaming-001.data', local_dir='test-model-cache/kroko-onnx')"

RealtimeSTT

RealtimeSTT is a Python speech-to-text library for applications that need voice activity detection, fast transcription, optional realtime text updates, wake words, and direct access to audio streams. It is designed for assistants, dictation tools, browser streaming servers, and prototypes that need to turn speech into text with only a few lines of code.

The recommended default path uses faster_whisper. Other engines are available through install extras when their optional dependencies and models are present.

Demo

https://github.com/user-attachments/assets/797e6552-27cd-41b1-a7f3-e5cbc72094f5

CLI demo code (reproduces the video above)

Featured Integration: Kroko/Banafo ASR

RealtimeSTT 1.0.1 adds native support for kroko_onnx, the local streaming ASR engine from the Kroko/Banafo team.

This integration has been on my wishlist for a long time. Kroko is a strong fit for RealtimeSTT's goals: fast, accurate local speech recognition.

Start with the public Community models for local testing, or see Kroko/Banafo's commercial model options if you need production licensing and higher-end models.

pip install "RealtimeSTT[kroko-builder,silero-onnx-cpu]"
stt-install-kroko --build

The silero-onnx-cpu extra gives AudioToTextRecorder a local VAD backend for recorder-based smoke tests and live microphone use.

See the Kroko-ONNX engine guide, Kroko ASR docs, and kroko-onnx on GitHub.

Install

Use Python 3.11 or newer for the current pinned dependency set.

pip install "RealtimeSTT[faster-whisper]"

On Linux, install PortAudio headers before installing the package:

sudo apt-get update
sudo apt-get install python3-dev portaudio19-dev

On macOS:

brew install portaudio

For CUDA, platform notes, and optional engine stacks, see docs/installation.md.

Microphone Example

This waits for speech, stops after the detected utterance, and prints the final transcript:

from RealtimeSTT import AudioToTextRecorder

if __name__ == "__main__":
    with AudioToTextRecorder() as recorder:
        print("Speak now")
        print(recorder.text())

Use the if __name__ == "__main__": guard when running scripts, especially on Windows, because RealtimeSTT uses multiprocessing for model work.

Automatic Recording Loop

For continuous dictation, pass a callback to text() so transcription work can complete asynchronously while your loop keeps listening:

from RealtimeSTT import AudioToTextRecorder


def process_text(text):
    print(text)


if __name__ == "__main__":
    recorder = AudioToTextRecorder()

    while True:
        recorder.text(process_text)

External Audio

Set use_microphone=False when audio comes from a file, stream, websocket, or another process. Feed 16-bit mono PCM chunks at 16 kHz, or pass the original sample rate so RealtimeSTT can resample:

from RealtimeSTT import AudioToTextRecorder

if __name__ == "__main__":
    recorder = AudioToTextRecorder(use_microphone=False)

    with open("audio_chunk.pcm", "rb") as audio_file:
        recorder.feed_audio(audio_file.read(), original_sample_rate=16000)

    print(recorder.text())
    recorder.shutdown()

More examples are in docs/quick-start.md and docs/external-audio.md.

Configuration Reference

Every AudioToTextRecorder constructor parameter is documented in docs/configuration.md, including model/engine selection, realtime transcription, VAD timing, wake words, callbacks, external audio, logging, and executor injection.

Features

  • Voice activity detection with WebRTC VAD and Silero VAD.
  • Final and realtime transcription with selectable engines.
  • Optional wake word activation through Porcupine or OpenWakeWord.
  • Direct microphone input or application-fed audio chunks.
  • Event callbacks for recording, VAD, realtime text, transcription, and wake word state.
  • A FastAPI browser streaming server example with multi-user session isolation, shared inference resources, metrics, and health endpoints.

Documentation

  • Quick start: shortest demos and common recording patterns.
  • Installation: platform setup, CUDA notes, and optional dependencies.
  • Configuration: complete AudioToTextRecorder parameter reference.
  • Transcription engines: engine selection and setup links.
  • Wake words: Porcupine and OpenWakeWord setup.
  • External audio: feeding audio without a microphone.
  • Testing: maintained unit and opt-in golden test workflow.
  • Test scripts: demos, manual tests, regressions, and legacy experiments under tests/.
  • FastAPI server: browser server configuration, protocol, metrics, and deployment notes.
  • Troubleshooting: common install, audio, CUDA, model, dependency, and runtime errors.
  • Engine licenses: license notes for optional engine runtimes and model families.

Engine-specific references:

Server Example

The browser FastAPI reference server lives in example_fastapi_server and is intended for source checkouts. It is not installed by the PyPI wheel; keeping it source-only keeps the wheel lean and avoids adding web-server dependencies for users who only need the recorder/API library.

python -m pip install -r example_fastapi_server/requirements.txt
python example_fastapi_server/server.py --host 0.0.0.0 --port 8010

For pip-only installs, use the Python recorder/API examples instead. If you want the FastAPI reference server, clone the repository or install from Git.

Open http://localhost:8010. See docs/fastapi-server.md for engine recipes, websocket protocol details, health checks, and metrics.

Contributing

Focused tests and small changes are easiest to review. The project keeps fast unit tests separate from opt-in real-model tests; see docs/testing.md.

License

MIT

Author

Kolja Beigel

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

realtimestt-1.0.1.tar.gz (162.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

realtimestt-1.0.1-py3-none-any.whl (167.0 kB view details)

Uploaded Python 3

File details

Details for the file realtimestt-1.0.1.tar.gz.

File metadata

  • Download URL: realtimestt-1.0.1.tar.gz
  • Upload date:
  • Size: 162.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for realtimestt-1.0.1.tar.gz
Algorithm Hash digest
SHA256 0db80b37c352ef051b41f54e238bf64531e0b6faf4900a587cec4e9c06d8e50a
MD5 e8d98e5e35c9190a83e6b54760e7e96b
BLAKE2b-256 c3896e5e026c12ef36390ac64bb9f72df2dbd9082d8a6be81556c53aa4cac7a5

See more details on using hashes here.

File details

Details for the file realtimestt-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: realtimestt-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 167.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for realtimestt-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 969636dd752e45b7bab8d58729edf7cbc65bba1e77919236acb641b61b680e62
MD5 9d774165c9770b0f48d5bdf5db39117d
BLAKE2b-256 483a693d932a06ece008978b36f9807aa38305ea874f3f527e28a492370eab8f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page