Real-time voice recognition using Silero VAD and Whisper
Project description
voicelistener
Real-time voice recognition using Silero VAD and Whisper.
Structure
voicelistener/
├── __init__.py
├── __main__.py # CLI entry point
├── voicelistener.py # VoiceListener class (audio + VAD + threading)
├── requirements.txt
└── transcribers/
├── __init__.py
├── whispertranscriber.py # WhisperTranscriber class
└── elevenlabstranscriber.py # ElevenLabsTranscriber class
Installation
pip install voicelistener
CLI usage
# Default (local Whisper)
python -m voicelistener
# ElevenLabs (requires ELEVENLABS_API_KEY env var)
python -m voicelistener --transcriber elevenlabs
Listens to your microphone, detects speech, and prints transcriptions to stdout. Press Ctrl+C to stop.
| Flag | Default | Description |
|---|---|---|
--transcriber |
whisper |
Speech-to-text backend (whisper or elevenlabs) |
Library usage
from voicelistener import VoiceListener, WhisperTranscriber, ElevenLabsTranscriber
# Local Whisper
transcriber = WhisperTranscriber(model="base.en")
# Or ElevenLabs (set ELEVENLABS_API_KEY env var)
# transcriber = ElevenLabsTranscriber()
listener = VoiceListener(transcriber=transcriber)
for text in listener:
print(text)
Callback style
def handle(text):
print(f"Heard: {text}")
listener = VoiceListener(
transcriber=WhisperTranscriber(),
on_transcription=handle,
)
listener.start()
VoiceListener options
| Parameter | Default | Description |
|---|---|---|
transcriber |
(required) | Object with a transcribe(audio) -> str method |
silence_timeout_ms |
2000 |
Silence duration (ms) to finalize an utterance |
min_utterance_ms |
250 |
Minimum speech length to transcribe |
pre_buffer_ms |
150 |
Audio kept before VAD triggers |
vad_threshold |
0.5 |
Silero VAD confidence threshold |
energy_threshold |
0.005 |
RMS energy below which VAD is skipped |
on_transcription |
None |
Callback invoked with each transcription |
on_speech_start |
None |
Callback invoked when speech is detected |
on_speech_end |
None |
Callback invoked when speech ends (silence timeout) |
Custom transcriber
Implement a class with a transcribe method:
class MyTranscriber:
def transcribe(self, audio):
# audio is a float32 numpy array at 16kHz
return "transcribed text"
listener = VoiceListener(transcriber=MyTranscriber())
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
voicelistener-1.0.1.tar.gz
(7.0 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file voicelistener-1.0.1.tar.gz.
File metadata
- Download URL: voicelistener-1.0.1.tar.gz
- Upload date:
- Size: 7.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e5f938bbea37535cc218e4e13517816a9e5f4408da9f966e398ab66764ae068a
|
|
| MD5 |
a3bba8b3b268b7dde908352761773869
|
|
| BLAKE2b-256 |
976f72c16388fc4c6e6314ed559ca5e3bd5d38dcfe5de6d140e1f9ec887d18fa
|
File details
Details for the file voicelistener-1.0.1-py3-none-any.whl.
File metadata
- Download URL: voicelistener-1.0.1-py3-none-any.whl
- Upload date:
- Size: 7.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c14b406b2da491d6d388c9766b26205b68136196ed29a45784069d7362eaf256
|
|
| MD5 |
6a348f827d50c2bc3579bdf9a121dc2c
|
|
| BLAKE2b-256 |
11af6f23ac4c95008fcde98eb4a6969e8e003c6715cdd574722278070590f315
|