Real-time voice recognition using Silero VAD and Whisper/Eleven Labs
Project description
voicelistener
Real-time voice recognition using Whisper or ElevenLabs. Speech detection uses fast RMS energy gating by default; Silero VAD is available as an optional higher-accuracy mode.
Structure
voicelistener/
├── __init__.py
├── __main__.py # CLI entry point
├── voicelistener.py # VoiceListener class (audio + VAD + threading)
├── requirements.txt
└── transcribers/
├── __init__.py
├── whispertranscriber.py # WhisperTranscriber class
└── elevenlabstranscriber.py # ElevenLabsTranscriber class
Installation
pip install voicelistener
CLI usage
# Default (local Whisper)
python -m voicelistener
# ElevenLabs (requires ELEVENLABS_API_KEY env var)
python -m voicelistener --transcriber elevenlabs
Listens to your microphone, detects speech, and prints transcriptions to stdout. Press Ctrl+C to stop.
| Flag | Default | Description |
|---|---|---|
--transcriber |
whisper |
Speech-to-text backend (whisper or elevenlabs) |
Library usage
from voicelistener import VoiceListener, WhisperTranscriber, ElevenLabsTranscriber
# Local Whisper
transcriber = WhisperTranscriber(model_id="base.en")
# Or ElevenLabs (set ELEVENLABS_API_KEY env var)
# transcriber = ElevenLabsTranscriber(model_id="scribe_v2")
listener = VoiceListener(transcriber=transcriber)
for text in listener:
print(text)
Callback style
def handle(text):
print(f"Heard: {text}")
listener = VoiceListener(
transcriber=WhisperTranscriber(),
on_transcription=handle,
)
listener.start()
VoiceListener options
| Parameter | Default | Description |
|---|---|---|
transcriber |
(required) | Object with a transcribe(audio) -> str method |
silence_timeout_ms |
2000 |
Silence duration (ms) to finalize an utterance |
min_utterance_ms |
250 |
Minimum speech length to transcribe |
pre_buffer_ms |
150 |
Audio kept before VAD triggers |
energy_only |
True |
Use RMS energy for speech detection (no torch required); set False to enable Silero VAD |
vad_threshold |
0.5 |
Silero VAD confidence threshold (used only when energy_only=False) |
energy_threshold |
0.005 |
RMS energy threshold; frames below this are treated as silence |
on_transcription |
None |
Callback invoked with each transcription |
on_speech_start |
None |
Callback invoked when speech is detected |
on_speech_end |
None |
Callback invoked when speech ends (silence timeout) |
Custom transcriber
Implement a class with a transcribe method:
class MyTranscriber:
def transcribe(self, audio):
# audio is a float32 numpy array at 16kHz
return "transcribed text"
listener = VoiceListener(transcriber=MyTranscriber())
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file voicelistener-1.0.3.tar.gz.
File metadata
- Download URL: voicelistener-1.0.3.tar.gz
- Upload date:
- Size: 7.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6375f9821ed8b83db8cb31583eafe1a9d6b95e6226587b7f7e3d28dc20289756
|
|
| MD5 |
dd20cb23feca02625dc7f8bbcdb9c840
|
|
| BLAKE2b-256 |
c52b85162635f78f6436bb9a7268b5163b6d509782ad4197b274edcf26d48c79
|
File details
Details for the file voicelistener-1.0.3-py3-none-any.whl.
File metadata
- Download URL: voicelistener-1.0.3-py3-none-any.whl
- Upload date:
- Size: 7.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e95743d1bae9e30d98e570ec854af607bc9c7b2c6a756081fb0170033eb66ca1
|
|
| MD5 |
b2ae08a0aa94914215ae687e5fde974b
|
|
| BLAKE2b-256 |
0897ce8b0d4a16a4d2e978d67de6de8247c830b7fb31b90edb46a8af21991537
|