Skip to main content

Real-time voice recognition using Silero VAD and Whisper

Project description

voicelistener

Real-time voice recognition using Silero VAD and Whisper.

Structure

voicelistener/
├── __init__.py
├── __main__.py              # CLI entry point
├── voicelistener.py         # VoiceListener class (audio + VAD + threading)
├── requirements.txt
└── transcribers/
    ├── __init__.py
    └── whispertranscriber.py  # WhisperTranscriber class

Setup

pip install -r voicelistener/requirements.txt

CLI usage

python -m voicelistener

Listens to your microphone, detects speech, and prints transcriptions to stdout. Press Ctrl+C to stop.

Library usage

from voicelistener import VoiceListener, WhisperTranscriber

transcriber = WhisperTranscriber(model="base.en")
listener = VoiceListener(transcriber=transcriber)

for text in listener:
    print(text)

Callback style

def handle(text):
    print(f"Heard: {text}")

listener = VoiceListener(
    transcriber=WhisperTranscriber(),
    on_transcription=handle,
)
listener.start()

VoiceListener options

Parameter Default Description
transcriber (required) Object with a transcribe(audio) -> str method
silence_timeout_ms 2000 Silence duration (ms) to finalize an utterance
min_utterance_ms 250 Minimum speech length to transcribe
pre_buffer_ms 150 Audio kept before VAD triggers
vad_threshold 0.5 Silero VAD confidence threshold
on_transcription None Callback invoked with each transcription

Custom transcriber

Implement a class with a transcribe method:

class MyTranscriber:
    def transcribe(self, audio):
        # audio is a float32 numpy array at 16kHz
        return "transcribed text"

listener = VoiceListener(transcriber=MyTranscriber())

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

voicelistener-1.0.0.tar.gz (5.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

voicelistener-1.0.0-py3-none-any.whl (6.1 kB view details)

Uploaded Python 3

File details

Details for the file voicelistener-1.0.0.tar.gz.

File metadata

  • Download URL: voicelistener-1.0.0.tar.gz
  • Upload date:
  • Size: 5.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for voicelistener-1.0.0.tar.gz
Algorithm Hash digest
SHA256 23b4ff87274cc26ce109a46b0262993fb34ef35ad413bf85c22edf5cd354ac3c
MD5 36961787549311035151fc7f1d78b7b1
BLAKE2b-256 6404c4ef35001cf553387b875d1ea32ad5e193bb5e70ffa5b1d3386f08cc15a6

See more details on using hashes here.

File details

Details for the file voicelistener-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: voicelistener-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 6.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for voicelistener-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7041dcfc091b8445963797f86c9a2e0e06d243e488be7ac2217682674397e11f
MD5 1470154e20610b53d41f4ced19aa68ba
BLAKE2b-256 d8934045f7200d2bebcc9dceaf982ed5840442ebbd3bdb53a50a3cb352cdaf1e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page