Robust speech recognition pipeline that prevents audio drops
Project description
Hearken
Robust speech recognition pipeline for Python that prevents audio drops during transcription.
The Problem
In typical speech detection programs, audio capture is blocked during transcription. This causes dropped frames when network I/O is slow, resulting in missed speech.
The Solution
Hearken decouples capture, voice activity detection (VAD), and transcription into independent threads with queue-based communication. The capture thread never blocks, preventing audio loss even during slow transcription.
Installation
# Basic installation (includes EnergyVAD)
pip install hearken
# With speech_recognition support
pip install hearken[sr]
# With WebRTC VAD support
pip install hearken[webrtc]
# With Silero VAD support (neural network)
pip install hearken[silero]
# All optional dependencies
pip install hearken[all]
Quick Start
import speech_recognition as sr
from hearken import Listener, EnergyVAD
from hearken.adapters.sr import SpeechRecognitionSource, SRTranscriber
# Setup
recognizer = sr.Recognizer()
mic = sr.Microphone()
with mic as source:
recognizer.adjust_for_ambient_noise(source)
# Create listener
listener = Listener(
source=SpeechRecognitionSource(mic),
transcriber=SRTranscriber(recognizer),
on_transcript=lambda text, seg: print(f"You said: {text}")
)
# Run
listener.start()
try:
listener.wait()
except KeyboardInterrupt:
listener.stop()
Features
- No dropped frames: Capture thread never blocks on downstream processing
- Two modes: Passive (callbacks) and active (
wait_for_speech()) - Clean abstractions: Bring your own audio source and transcriber
- Production-ready FSM: Robust 4-state detector filters false starts and handles pauses
Voice Activity Detection (VAD)
- EnergyVAD: Simple energy-based detection with dynamic threshold calibration
- WebRTCVAD: Google WebRTC VAD for improved accuracy in noisy environments
- Requires sample rates: 8000, 16000, 32000, or 48000 Hz
- Configurable aggressiveness (0-3)
- Install with:
pip install hearken[webrtc]
- SileroVAD: Neural network-based VAD for superior accuracy
- Requires 16kHz audio
- Configurable sensitivity threshold
- Automatic model download and caching
- Install with:
pip install hearken[silero]
Architecture
Microphone → [Capture Thread] → Queue → [Detect Thread] → Queue → [Transcribe Thread] → Callback
↓ ↓ ↓
AudioChunk (30ms) SpeechSegment (complete) Text transcription
Active Mode
listener = Listener(
source=SpeechRecognitionSource(mic),
transcriber=SRTranscriber(recognizer),
)
listener.start()
while True:
print("Waiting for speech...")
segment = listener.wait_for_speech()
if segment:
try:
text = listener.transcriber.transcribe(segment)
print(f"You said: {text}")
except sr.UnknownValueError:
print("Could not understand")
Documentation
See examples/ for more usage patterns.
Development
# Clone repository
git clone https://github.com/hipsterbrown/hearken.git
cd hearken
# Install with dev dependencies
uv sync --all-extras
# Run tests
pytest
# Run tests with coverage
pytest --cov=hearken --cov-report=term-missing
# Format code
black hearken/ tests/
# Type checking
mypy hearken/
# Linting
ruff check hearken/ tests/
Roadmap
- ✅ v0.1: EnergyVAD, core pipeline
- ✅ v0.2: WebRTC VAD support
- ✅ v0.3: Silero VAD (neural network)
- v0.4: Async transcriber support
License
Apache 2.0
Contributing
Contributions welcome! Please open an issue or PR on GitHub.
Credits
Created by Nick Hehr (@hipsterbrown)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hearken-0.3.0.tar.gz.
File metadata
- Download URL: hearken-0.3.0.tar.gz
- Upload date:
- Size: 137.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8c8aae87363c7dc25d85b3edb572481c52519ae764dd57b17b89c2566186657f
|
|
| MD5 |
5013522ba810c094ecbbeb7840ab2409
|
|
| BLAKE2b-256 |
f91dce12e17b4ccf7d92eb9f9cfd7bd58213b695d558a104f703645d9d211cac
|
Provenance
The following attestation bundles were made for hearken-0.3.0.tar.gz:
Publisher:
publish.yml on HipsterBrown/hearken
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
hearken-0.3.0.tar.gz -
Subject digest:
8c8aae87363c7dc25d85b3edb572481c52519ae764dd57b17b89c2566186657f - Sigstore transparency entry: 743469630
- Sigstore integration time:
-
Permalink:
HipsterBrown/hearken@b26afa97e69e9ab277ec68c4f0babd3346e50656 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/HipsterBrown
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@b26afa97e69e9ab277ec68c4f0babd3346e50656 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file hearken-0.3.0-py3-none-any.whl.
File metadata
- Download URL: hearken-0.3.0-py3-none-any.whl
- Upload date:
- Size: 21.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7c40711182a4c211d4fba3bee269e3e2d9c691aaa14e3eef6c417e1a66d3606e
|
|
| MD5 |
5f988706d829e5c36995d18eeddd9eab
|
|
| BLAKE2b-256 |
5b5f897933b609cd992df7e87a5cd4791bd54f4295846423d036230f7006157e
|
Provenance
The following attestation bundles were made for hearken-0.3.0-py3-none-any.whl:
Publisher:
publish.yml on HipsterBrown/hearken
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
hearken-0.3.0-py3-none-any.whl -
Subject digest:
7c40711182a4c211d4fba3bee269e3e2d9c691aaa14e3eef6c417e1a66d3606e - Sigstore transparency entry: 743469634
- Sigstore integration time:
-
Permalink:
HipsterBrown/hearken@b26afa97e69e9ab277ec68c4f0babd3346e50656 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/HipsterBrown
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@b26afa97e69e9ab277ec68c4f0babd3346e50656 -
Trigger Event:
workflow_dispatch
-
Statement type: