Skip to main content

Robust speech recognition pipeline that prevents audio drops

Project description

Hearken

Robust speech recognition pipeline for Python that prevents audio drops during transcription.

The Problem

In typical speech detection programs, audio capture is blocked during transcription. This causes dropped frames when network I/O is slow, resulting in missed speech.

The Solution

Hearken decouples capture, voice activity detection (VAD), and transcription into independent threads with queue-based communication. The capture thread never blocks, preventing audio loss even during slow transcription.

Installation

# Basic installation (includes EnergyVAD)
pip install hearken

# With speech_recognition support
pip install hearken[sr]

# With WebRTC VAD support
pip install hearken[webrtc]

# With Silero VAD support (neural network)
pip install hearken[silero]

# All optional dependencies
pip install hearken[all]

Quick Start

import speech_recognition as sr
from hearken import Listener, EnergyVAD
from hearken.adapters.sr import SpeechRecognitionSource, SRTranscriber

# Setup
recognizer = sr.Recognizer()
mic = sr.Microphone()

with mic as source:
    recognizer.adjust_for_ambient_noise(source)

# Create listener
listener = Listener(
    source=SpeechRecognitionSource(mic),
    transcriber=SRTranscriber(recognizer),
    on_transcript=lambda text, seg: print(f"You said: {text}")
)

# Run
listener.start()
try:
    listener.wait()
except KeyboardInterrupt:
    listener.stop()

Features

  • No dropped frames: Capture thread never blocks on downstream processing
  • Two modes: Passive (callbacks) and active (wait_for_speech())
  • Clean abstractions: Bring your own audio source and transcriber
  • Production-ready FSM: Robust 4-state detector filters false starts and handles pauses

Voice Activity Detection (VAD)

  • EnergyVAD: Simple energy-based detection with dynamic threshold calibration
  • WebRTCVAD: Google WebRTC VAD for improved accuracy in noisy environments
    • Requires sample rates: 8000, 16000, 32000, or 48000 Hz
    • Configurable aggressiveness (0-3)
    • Install with: pip install hearken[webrtc]
  • SileroVAD: Neural network-based VAD for superior accuracy
    • Requires 16kHz audio
    • Configurable sensitivity threshold
    • Automatic model download and caching
    • Install with: pip install hearken[silero]

Architecture

Microphone → [Capture Thread] → Queue → [Detect Thread] → Queue → [Transcribe Thread] → Callback
                   ↓                          ↓                         ↓
            AudioChunk (30ms)      SpeechSegment (complete)    Text transcription

Active Mode

listener = Listener(
    source=SpeechRecognitionSource(mic),
    transcriber=SRTranscriber(recognizer),
)

listener.start()

while True:
    print("Waiting for speech...")
    segment = listener.wait_for_speech()

    if segment:
        try:
            text = listener.transcriber.transcribe(segment)
            print(f"You said: {text}")
        except sr.UnknownValueError:
            print("Could not understand")

Documentation

See examples/ for more usage patterns.

Development

# Clone repository
git clone https://github.com/hipsterbrown/hearken.git
cd hearken

# Install with dev dependencies
uv sync --all-extras

# Run tests
pytest

# Run tests with coverage
pytest --cov=hearken --cov-report=term-missing

# Format code
black hearken/ tests/

# Type checking
mypy hearken/

# Linting
ruff check hearken/ tests/

Roadmap

  • ✅ v0.1: EnergyVAD, core pipeline
  • ✅ v0.2: WebRTC VAD support
  • ✅ v0.3: Silero VAD (neural network)
  • v0.4: Async transcriber support

License

Apache 2.0

Contributing

Contributions welcome! Please open an issue or PR on GitHub.

Credits

Created by Nick Hehr (@hipsterbrown)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hearken-0.4.0.tar.gz (138.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hearken-0.4.0-py3-none-any.whl (22.2 kB view details)

Uploaded Python 3

File details

Details for the file hearken-0.4.0.tar.gz.

File metadata

  • Download URL: hearken-0.4.0.tar.gz
  • Upload date:
  • Size: 138.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for hearken-0.4.0.tar.gz
Algorithm Hash digest
SHA256 4ce10306d974c1e63231e7444374ce99516e8ad663bc6142adbdddb3bd69f5e9
MD5 4417b71bf1396d3e32cca834edfd2023
BLAKE2b-256 4ab0d1a6dd78e2efc8f666ee9d9e31499b0403a9b26ac761112cbe0c678a0420

See more details on using hashes here.

Provenance

The following attestation bundles were made for hearken-0.4.0.tar.gz:

Publisher: publish.yml on HipsterBrown/hearken

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hearken-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: hearken-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 22.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for hearken-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2bb3c13b4aa78eb00a7dcbe89cd4ac6691d06cd8e5e77670e034546de5c8b621
MD5 5208d832c013c62b2f22949f495a6a51
BLAKE2b-256 4dc72f2f3b8c193284731e1eee70fc2390e2ded651253d603afc7029743c217b

See more details on using hashes here.

Provenance

The following attestation bundles were made for hearken-0.4.0-py3-none-any.whl:

Publisher: publish.yml on HipsterBrown/hearken

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page