Skip to main content

Real-time voice recognition using Silero VAD and Whisper/Eleven Labs

Project description

voicelistener

Real-time voice recognition using Whisper or ElevenLabs. Speech detection uses fast RMS energy gating by default; Silero VAD is available as an optional higher-accuracy mode.

Structure

voicelistener/
├── __init__.py
├── __main__.py              # CLI entry point
├── voicelistener.py         # VoiceListener class (audio + VAD + threading)
├── requirements.txt
└── transcribers/
    ├── __init__.py
    ├── whispertranscriber.py      # WhisperTranscriber class
    └── elevenlabstranscriber.py   # ElevenLabsTranscriber class

Installation

pip install voicelistener

CLI usage

# Default (local Whisper)
python -m voicelistener

# ElevenLabs (requires ELEVENLABS_API_KEY env var)
python -m voicelistener --transcriber elevenlabs

Listens to your microphone, detects speech, and prints transcriptions to stdout. Press Ctrl+C to stop.

Flag Default Description
--transcriber whisper Speech-to-text backend (whisper or elevenlabs)

Library usage

from voicelistener import VoiceListener, WhisperTranscriber, ElevenLabsTranscriber

# Local Whisper
transcriber = WhisperTranscriber(model="base.en")

# Or ElevenLabs (set ELEVENLABS_API_KEY env var)
# transcriber = ElevenLabsTranscriber()

listener = VoiceListener(transcriber=transcriber)

for text in listener:
    print(text)

Callback style

def handle(text):
    print(f"Heard: {text}")

listener = VoiceListener(
    transcriber=WhisperTranscriber(),
    on_transcription=handle,
)
listener.start()

VoiceListener options

Parameter Default Description
transcriber (required) Object with a transcribe(audio) -> str method
silence_timeout_ms 2000 Silence duration (ms) to finalize an utterance
min_utterance_ms 250 Minimum speech length to transcribe
pre_buffer_ms 150 Audio kept before VAD triggers
energy_only True Use RMS energy for speech detection (no torch required); set False to enable Silero VAD
vad_threshold 0.5 Silero VAD confidence threshold (used only when energy_only=False)
energy_threshold 0.005 RMS energy threshold; frames below this are treated as silence
on_transcription None Callback invoked with each transcription
on_speech_start None Callback invoked when speech is detected
on_speech_end None Callback invoked when speech ends (silence timeout)

Custom transcriber

Implement a class with a transcribe method:

class MyTranscriber:
    def transcribe(self, audio):
        # audio is a float32 numpy array at 16kHz
        return "transcribed text"

listener = VoiceListener(transcriber=MyTranscriber())

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

voicelistener-1.0.2.tar.gz (7.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

voicelistener-1.0.2-py3-none-any.whl (7.9 kB view details)

Uploaded Python 3

File details

Details for the file voicelistener-1.0.2.tar.gz.

File metadata

  • Download URL: voicelistener-1.0.2.tar.gz
  • Upload date:
  • Size: 7.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for voicelistener-1.0.2.tar.gz
Algorithm Hash digest
SHA256 47eab22d01c3e7ad19c3819b1ab2770dfbb370ab83ea62d434767cc3ee650c4a
MD5 0c043240e6b8734187fbf8c885170748
BLAKE2b-256 0d8ea24f4f2c6c0e9d7f72400f2b8a551460b6f2e892cc27efb824c17ea72d08

See more details on using hashes here.

File details

Details for the file voicelistener-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: voicelistener-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 7.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for voicelistener-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 30ba57d7440097ef0e963b2b3ccd3c0a9c8ef88ab437538e87322ff87c5182ab
MD5 5061c8d9509137cb78e40ab68b1f400b
BLAKE2b-256 c5c9dac54065425f8062e5e5637292615a6b1467ec4b3ad2e2c2c31b8e980ec9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page