Skip to main content

Robust speech recognition pipeline that prevents audio drops

Project description

Hearken

Robust speech recognition pipeline for Python that prevents audio drops during transcription.

The Problem

In typical speech detection programs, audio capture is blocked during transcription. This causes dropped frames when network I/O is slow, resulting in missed speech.

The Solution

Hearken decouples capture, voice activity detection (VAD), and transcription into independent threads with queue-based communication. The capture thread never blocks, preventing audio loss even during slow transcription.

Installation

# Basic installation (includes EnergyVAD)
pip install hearken

# With speech_recognition support
pip install hearken[sr]

# With WebRTC VAD support
pip install hearken[webrtc]

# With Silero VAD support (neural network)
pip install hearken[silero]

# All optional dependencies
pip install hearken[all]

Quick Start

import speech_recognition as sr
from hearken import Listener, EnergyVAD
from hearken.adapters.sr import SpeechRecognitionSource, SRTranscriber

# Setup
recognizer = sr.Recognizer()
mic = sr.Microphone()

with mic as source:
    recognizer.adjust_for_ambient_noise(source)

# Create listener
listener = Listener(
    source=SpeechRecognitionSource(mic),
    transcriber=SRTranscriber(recognizer),
    on_transcript=lambda text, seg: print(f"You said: {text}")
)

# Run
listener.start()
try:
    listener.wait()
except KeyboardInterrupt:
    listener.stop()

Features

  • No dropped frames: Capture thread never blocks on downstream processing
  • Two modes: Passive (callbacks) and active (wait_for_speech())
  • Clean abstractions: Bring your own audio source and transcriber
  • Production-ready FSM: Robust 4-state detector filters false starts and handles pauses

Voice Activity Detection (VAD)

  • EnergyVAD: Simple energy-based detection with dynamic threshold calibration
  • WebRTCVAD: Google WebRTC VAD for improved accuracy in noisy environments
    • Requires sample rates: 8000, 16000, 32000, or 48000 Hz
    • Configurable aggressiveness (0-3)
    • Install with: pip install hearken[webrtc]
  • SileroVAD: Neural network-based VAD for superior accuracy
    • Requires 16kHz audio
    • Configurable sensitivity threshold
    • Automatic model download and caching
    • Install with: pip install hearken[silero]

Architecture

Microphone → [Capture Thread] → Queue → [Detect Thread] → Queue → [Transcribe Thread] → Callback
                   ↓                          ↓                         ↓
            AudioChunk (30ms)      SpeechSegment (complete)    Text transcription

Active Mode

listener = Listener(
    source=SpeechRecognitionSource(mic),
    transcriber=SRTranscriber(recognizer),
)

listener.start()

while True:
    print("Waiting for speech...")
    segment = listener.wait_for_speech()

    if segment:
        try:
            text = listener.transcriber.transcribe(segment)
            print(f"You said: {text}")
        except sr.UnknownValueError:
            print("Could not understand")

Documentation

See examples/ for more usage patterns.

Development

# Clone repository
git clone https://github.com/hipsterbrown/hearken.git
cd hearken

# Install with dev dependencies
uv sync --all-extras

# Run tests
pytest

# Run tests with coverage
pytest --cov=hearken --cov-report=term-missing

# Format code
black hearken/ tests/

# Type checking
mypy hearken/

# Linting
ruff check hearken/ tests/

Roadmap

  • ✅ v0.1: EnergyVAD, core pipeline
  • ✅ v0.2: WebRTC VAD support
  • ✅ v0.3: Silero VAD (neural network)
  • v0.4: Async transcriber support

License

Apache 2.0

Contributing

Contributions welcome! Please open an issue or PR on GitHub.

Credits

Created by Nick Hehr (@hipsterbrown)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hearken-0.3.0.tar.gz (137.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hearken-0.3.0-py3-none-any.whl (21.4 kB view details)

Uploaded Python 3

File details

Details for the file hearken-0.3.0.tar.gz.

File metadata

  • Download URL: hearken-0.3.0.tar.gz
  • Upload date:
  • Size: 137.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for hearken-0.3.0.tar.gz
Algorithm Hash digest
SHA256 8c8aae87363c7dc25d85b3edb572481c52519ae764dd57b17b89c2566186657f
MD5 5013522ba810c094ecbbeb7840ab2409
BLAKE2b-256 f91dce12e17b4ccf7d92eb9f9cfd7bd58213b695d558a104f703645d9d211cac

See more details on using hashes here.

Provenance

The following attestation bundles were made for hearken-0.3.0.tar.gz:

Publisher: publish.yml on HipsterBrown/hearken

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hearken-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: hearken-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 21.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for hearken-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7c40711182a4c211d4fba3bee269e3e2d9c691aaa14e3eef6c417e1a66d3606e
MD5 5f988706d829e5c36995d18eeddd9eab
BLAKE2b-256 5b5f897933b609cd992df7e87a5cd4791bd54f4295846423d036230f7006157e

See more details on using hashes here.

Provenance

The following attestation bundles were made for hearken-0.3.0-py3-none-any.whl:

Publisher: publish.yml on HipsterBrown/hearken

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page