Skip to main content

Real-time voice recognition using Silero VAD and Whisper

Project description

voicelistener

Real-time voice recognition using Silero VAD and Whisper.

Structure

voicelistener/
├── __init__.py
├── __main__.py              # CLI entry point
├── voicelistener.py         # VoiceListener class (audio + VAD + threading)
├── requirements.txt
└── transcribers/
    ├── __init__.py
    ├── whispertranscriber.py      # WhisperTranscriber class
    └── elevenlabstranscriber.py   # ElevenLabsTranscriber class

Installation

pip install voicelistener

CLI usage

# Default (local Whisper)
python -m voicelistener

# ElevenLabs (requires ELEVENLABS_API_KEY env var)
python -m voicelistener --transcriber elevenlabs

Listens to your microphone, detects speech, and prints transcriptions to stdout. Press Ctrl+C to stop.

Flag Default Description
--transcriber whisper Speech-to-text backend (whisper or elevenlabs)

Library usage

from voicelistener import VoiceListener, WhisperTranscriber, ElevenLabsTranscriber

# Local Whisper
transcriber = WhisperTranscriber(model="base.en")

# Or ElevenLabs (set ELEVENLABS_API_KEY env var)
# transcriber = ElevenLabsTranscriber()

listener = VoiceListener(transcriber=transcriber)

for text in listener:
    print(text)

Callback style

def handle(text):
    print(f"Heard: {text}")

listener = VoiceListener(
    transcriber=WhisperTranscriber(),
    on_transcription=handle,
)
listener.start()

VoiceListener options

Parameter Default Description
transcriber (required) Object with a transcribe(audio) -> str method
silence_timeout_ms 2000 Silence duration (ms) to finalize an utterance
min_utterance_ms 250 Minimum speech length to transcribe
pre_buffer_ms 150 Audio kept before VAD triggers
vad_threshold 0.5 Silero VAD confidence threshold
energy_threshold 0.005 RMS energy below which VAD is skipped
on_transcription None Callback invoked with each transcription
on_speech_start None Callback invoked when speech is detected
on_speech_end None Callback invoked when speech ends (silence timeout)

Custom transcriber

Implement a class with a transcribe method:

class MyTranscriber:
    def transcribe(self, audio):
        # audio is a float32 numpy array at 16kHz
        return "transcribed text"

listener = VoiceListener(transcriber=MyTranscriber())

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

voicelistener-1.0.1.tar.gz (7.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

voicelistener-1.0.1-py3-none-any.whl (7.7 kB view details)

Uploaded Python 3

File details

Details for the file voicelistener-1.0.1.tar.gz.

File metadata

  • Download URL: voicelistener-1.0.1.tar.gz
  • Upload date:
  • Size: 7.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for voicelistener-1.0.1.tar.gz
Algorithm Hash digest
SHA256 e5f938bbea37535cc218e4e13517816a9e5f4408da9f966e398ab66764ae068a
MD5 a3bba8b3b268b7dde908352761773869
BLAKE2b-256 976f72c16388fc4c6e6314ed559ca5e3bd5d38dcfe5de6d140e1f9ec887d18fa

See more details on using hashes here.

File details

Details for the file voicelistener-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: voicelistener-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 7.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for voicelistener-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c14b406b2da491d6d388c9766b26205b68136196ed29a45784069d7362eaf256
MD5 6a348f827d50c2bc3579bdf9a121dc2c
BLAKE2b-256 11af6f23ac4c95008fcde98eb4a6969e8e003c6715cdd574722278070590f315

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page