Lightweight local wake word detection that recognizes phrases with just a few user-provided recordings. No model training required.
Project description
local-wake
Lightweight wake word detection that runs locally and is suitable for resource-constrained devices like the Raspberry Pi. It requires no model training to support custom wake words and can be fully configured by end users on their devices. The system is based on feature extraction combined with time-warping comparison against a user-defined reference set.
Installation
Prerequisites
- Python 3.9 or later
- pip (Python package manager)
- Audio input device (e.g., microphone)
Steps
pip install local-wake
System Dependencies
sounddevice package depends on PortAudio which is not installed automatically on Linux. You can install it manually on Ubuntu with this:
sudo apt install libportaudio2
Install from source
git clone https://github.com/st-matskevich/local-wake.git
cd local-wake
pip install .
Usage
CLI Usage
Recording Reference Samples
The reference set is a collection of wake word recordings used as templates during detection. Usually, 3-4 samples are sufficient to achieve reliable detection performance.
lwake record ref/sample-1.wav
ref/sample-1.wav- Path for the recorded file
Optional arguments:
--duration(default: 3) - Duration in seconds--no-vad- Skip Voice Activity Detection silence trimming
Alternatively, you may use any recording tool of your choice. However, make sure that appropriate preprocessing is applied - specifically, silence must be trimmed from the recordings to achieve proper detection performance. A simple example of recording on Linux is:
arecord -d 3 -r 16000 -c 1 -f S16_LE output.wav
Notes:
- Current Voice Activity Detection may occasionally be too aggressive, therefore, verify your recordings to ensure the wake word is fully captured and not inadvertently trimmed.
Audio Comparison
To evaluate comparison and determine a suitable detection threshold:
lwake compare ref/sample-1.wav ref/sample-2.wav
ref/sample-1.wav- Path to the first file for comparisonref/sample-2.wav- Path to the second file for comparison
Optional arguments:
--method(default:embedding) - Feature extraction method:embeddingormfcc
Real-time Detection
Once reference set is ready and threshold value has been identified:
lwake listen reference/folder 0.1
reference/folder- Directory containing your reference wake word .wav files0.1- Detection threshold. Adjust this value based on your comparison tests to balance sensitivity and false positives
Optional arguments:
--method(default:embedding) - Feature extraction method:embeddingormfcc--buffer-size(default: 2.0) - Audio buffer size in seconds--slide-size(default: 0.25) - Step size in seconds for the sliding window--debug- Enable debug logs to observe real-time scores for incoming audio chunks
All logs are printed to stderr, while detection events are printed in JSON format to stdout:
{"timestamp": 1754947173771, "wakeword": "sample-01.wav", "distance": 0.00943875619501332}
Notes:
- Buffer size should be similar to or slightly larger than your reference recording length
- Slide size can be set lower for better precision at the cost of higher CPU usage
Library Usage
You can also use local-wake as a Python library:
import lwake
# Record audio sample
lwake.record("sample.wav", duration=3, trim_silence=True)
# Compare two audio files
distance = lwake.compare("file1.wav", "file2.wav", method="embedding")
print(f"Distance: {distance}")
# Real-time detection. Callback blocks further listening until return.
# Callback also exposes underlying sounddevice stream if you need to read more audio
def handle_detection(detection, stream):
print(f"Detected '{detection['wakeword']}' at {detection['timestamp']}")
# audio, _ = stream.read(16000) # Read 1 second of audio
# soundfile.write("input.wav", audio, samplerate=16000) # Save recording
lwake.listen("reference/folder", threshold=0.1, method="embedding", callback=handle_detection)
Examples
This repository includes several pre-recorded examples for experimenting with the project. You can find them in the examples directory. While each example provides a suggested detection threshold, this value may require adjustment based on differences in microphone quality and environment.
Implementation
Existing solutions for wake word detection can generally be divided into two categories:
-
Classical deterministic, speaker-dependent approaches - Typically based on MFCC feature extraction combined with DTW, as used in projects such as Rhasspy Raven or Snips.
- Advantages: Support for user-defined wake words with minimal development effort.
- Limitations: Strongly speaker-dependent, requiring sample collection from all intended users. Highly sensitive to background noise.
-
Modern model-based, speaker-independent approaches - Use neural models to classify wake words directly, as in openWakeWord or Porcupine.
- Advantages: High precision across multiple speakers without additional sample collection.
- Limitations: Do not support arbitrary user-defined wake words. Adapting to product-specific wake words requires model retraining or fine-tuning, which, depending on the solution, can be complex and typically requires at least a basic understanding of machine learning concepts and dataset preparation.
Choosing either category imposes strict limitations: deterministic methods sacrifice robustness, while model-based methods sacrifice adaptability.
local-wake combines neural feature extraction with classical sequence matching to achieve flexible and robust wake word detection. It uses a pretrained Google's speech-embedding model (converted to ONNX format for efficient inference) to extract speech features, then applies Dynamic Time Warping to compare incoming audio against a user-defined reference set of wake word samples.
This approach merges the advantages of both categories described above: it supports user-defined wake words like traditional deterministic methods, while benefiting from the enhanced feature representations and noise robustness provided by neural models. The result is a system that delivers good precision and flexibility without requiring extensive model training or large datasets.
Evaluation
local-wake achieves 98.6% detection accuracy on clean same-speaker audio using the Qualcomm Keyword Speech Dataset.
For detailed evaluation results, see the benchmark documentation.
To do
- Consider using a small model on top of feature extraction for comparison instead of DTW
- Consider using noise suppression for audio preprocessing
Built with
License
Distributed under the MIT License. See LICENSE for more information.
Contributing
Want a new feature added? Found a bug? Go ahead and open a new issue or feel free to submit a pull request.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file local_wake-0.1.0.tar.gz.
File metadata
- Download URL: local_wake-0.1.0.tar.gz
- Upload date:
- Size: 1.6 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1ba31de25506e639a3b523054773e188b7e3a707f31c246f36b18358070f6a2f
|
|
| MD5 |
1cc3fd95e990002de8baf183dede76ac
|
|
| BLAKE2b-256 |
267c63db5b356224fb042c84247afd8ff7ab857d0c21185cd3fc9fdfb332669f
|
File details
Details for the file local_wake-0.1.0-py3-none-any.whl.
File metadata
- Download URL: local_wake-0.1.0-py3-none-any.whl
- Upload date:
- Size: 1.6 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
254eaa6441e11a9ab173f81c53387f4e19696f4dcac67a5e3348d88e77deb44a
|
|
| MD5 |
4abeacbcf2c341c03d4b0ca0d962de23
|
|
| BLAKE2b-256 |
ebaa5bf952806c30a4d0dc3c243428770d9c9356068a41d0b69391ec8bf4ce17
|