Open-source wake word detection SDK with training pipeline — privacy-first, on-device, Python-native
Project description
ViolaWake SDK
The open-source alternative to Porcupine. A production-tested wake word engine with accessible training, ONNX inference, and a Python-first SDK.
Why ViolaWake?
| ViolaWake | Porcupine (Picovoice) | openWakeWord | |
|---|---|---|---|
| License | Apache 2.0 | Proprietary (metered) | Apache 2.0 |
| Training code open | Yes | No (closed) | Yes |
| Custom wake words | Yes (training CLI) | Yes (paid Console) | Yes (fine-tune) |
| Evaluation tooling | violawake-eval (Cohen's d, EER, FAR/FRR, ROC AUC) |
None published | Basic |
| On-device | Yes (ONNX) | Yes (proprietary C lib) | Yes (ONNX) |
| Integrated TTS | Yes (Kokoro-82M, optional extra) | No | No |
| Python SDK | First-class | C wrapper | First-class |
| Price at scale | Free | Paid (free tier available) | Free |
Our moat: Open training code, transparent evaluation with reproducible benchmarks, production-hardened data augmentation (gain, time stretch, pitch shift, noise mixing), and a 4-gate decision policy that suppresses false positives during music playback. On a fair head-to-head benchmark against openWakeWord (same corpus, same pipeline, adversarial negatives for both systems), ViolaWake achieves EER 5.49% vs OWW's 8.24% — each system tested on its own best wake word. Running in production, not a demo.
A note on accuracy claims: Our benchmark uses TTS-generated audio with adversarial confusables, not real-speaker recordings. Real-world accuracy depends on your deployment environment. We publish our benchmark scripts so you can reproduce and extend them. Run
violawake-evalon your own test data.
Quick Start
pip install "violawake[audio,download]"
violawake-download --model temporal_cnn
Wake Word Detection (5 lines)
from violawake_sdk import WakeDetector
detector = WakeDetector(model="temporal_cnn", threshold=0.80, confirm_count=3)
for audio_chunk in detector.stream_mic(): # 20ms chunks at 16kHz
if detector.detect(audio_chunk):
print("Wake word detected!")
break
confirm_count=3requires 3 consecutive above-threshold frames before firing, reducing false accepts by ~82-87% depending on threshold. Useconfirm_count=1for lowest latency.
Threshold Tuning
The threshold parameter controls the trade-off between sensitivity and false positives:
| Threshold | Behavior | Use Case |
|---|---|---|
| 0.70 | Sensitive -- more detections, more false positives | Quiet rooms, close-mic setups |
| 0.80 | Balanced (default) -- recommended starting point | General-purpose, most environments |
| 0.85 | Conservative -- fewer false positives, may miss some wake words | Living rooms with TV/music |
| 0.90+ | Very conservative -- lowest false positive rate | Noisy environments, always-on kiosks |
Start at 0.80 and adjust based on your false accept rate. Use violawake-streaming-eval to measure FAPH (false accepts per hour) on representative audio from your deployment environment, or violawake-eval for clip-by-clip EER/FAR/FRR/ROC AUC.
Text-to-Speech (Kokoro-82M)
from violawake_sdk import TTSEngine
tts = TTSEngine() # Downloads kokoro-v1.0.onnx + voices-v1.0.bin on first run (~354MB total)
audio = tts.synthesize("Hello from ViolaWake!")
tts.play(audio)
Voice Activity Detection
from violawake_sdk import VADEngine
vad = VADEngine(backend="webrtc") # or "silero", "rms"
prob = vad.process_frame(audio_bytes) # returns 0.0–1.0 speech probability
Full Pipeline (Wake → STT → TTS)
Requires:
pip install "violawake[audio,stt,tts]"
from violawake_sdk import VoicePipeline
pipeline = VoicePipeline(
wake_word="viola",
stt_model="base", # faster-whisper model size
tts_voice="af_heart", # Kokoro voice
)
@pipeline.on_command
def handle_command(text: str) -> None:
print(f"Command: {text}")
pipeline.speak(f"You said: {text}") # Or return a string to auto-speak
pipeline.run() # Blocks — Ctrl+C to stop
Architecture
┌─────────────────────────────────────────────────────────────┐
│ VoicePipeline │
│ │
│ Mic ──► [WakeDetector] ──► [VAD] ──► [STT] ──► callback │
│ │
│ text ──► [TTS] ──► Speaker │
└─────────────────────────────────────────────────────────────┘
Components:
| Module | Engine | Size | Latency |
|---|---|---|---|
| Wake word | Temporal CNN on OWW embeddings (ONNX) | ~100 KB head (+OWW backbone via openwakeword) |
~8ms/frame |
| VAD | WebRTC VAD / Silero / RMS heuristic | <1 MB | <1ms/frame |
| STT | faster-whisper base |
145 MB | 0.5–2s |
| TTS | Kokoro-82M (ONNX) | 326 MB | 0.3–0.8s/sentence |
Training Your Own Wake Word
The training CLI lets you train a custom wake word model with ~200 positive samples:
# Collect positive samples (read prompts aloud)
violawake-collect --word "jarvis" --output data/jarvis/positives/ --count 200
# Train (auto-generates TTS positives, confusable negatives, and speech negatives)
violawake-train \
--word "jarvis" \
--positives data/jarvis/positives/ \
--output models/jarvis.onnx \
--epochs 50
# To disable augmentation, add --no-augment
# To use legacy MLP architecture, add --architecture mlp
# Evaluate (EER, FAR/FRR, ROC AUC)
violawake-eval \
--model models/jarvis.onnx \
--test-dir data/jarvis/test/ \
--report
The --test-dir must contain positives/ and negatives/ subdirectories.
Expected results: EER < 10% (against the bundled synthetic negative corpus) with 200+ quality positive samples. Your real-world performance will depend on your deployment environment and negative speech corpus.
Proof: "Operator" Custom Wake Word (89 seconds, EER 7.2%)
To prove the training pipeline generalizes beyond "Viola," we trained a custom "operator" model from scratch — zero manual data collection:
| ViolaWake "viola" | ViolaWake "operator" | OWW "alexa" (pre-trained) | |
|---|---|---|---|
| EER | 5.49% | 7.2% | 8.24% |
| ROC AUC | 0.988 | 0.984 | 0.956 |
| Training time | ~48s | 89s | N/A (pre-trained) |
| Architecture | Temporal CNN | Temporal CNN | MLP on OWW embeddings |
The training CLI handled TTS sample generation (20 Edge TTS voices), confusable negative generation (16 phonetic variants), 10x augmentation, and Temporal CNN training end-to-end. OWW provides training notebooks but no pip-installable CLI tool.
Full methodology, corpus details, and reproducibility instructions: benchmark_v2/OPERATOR_BENCHMARK.md
Models
Models are versioned and published to GitHub Releases. Use registry names without file extensions when passing --model or WakeDetector(model=...). Download separately (too large for PyPI):
python -m violawake_sdk.tools.download_model --model temporal_cnn # default, ~100 KB
python -m violawake_sdk.tools.download_model --model kokoro_v1_0 # TTS model, 326 MB
python -m violawake_sdk.tools.download_model --model kokoro_voices_v1_0 # TTS voices, 28 MB
| Model | Type | Size | EER* | Notes |
|---|---|---|---|---|
temporal_cnn.onnx |
Temporal CNN on OWW embeddings | ~100 KB | 5.49% | Production default — best live recall + lowest FP |
temporal_convgru.onnx |
Temporal Conv-GRU on OWW embeddings | ~81 KB | -- | Reserve model |
r3_10x_s42.onnx |
MLP on OWW embeddings | ~34 KB | -- | Deprecated — fails live mic test. Do not use. |
kokoro-v1.0.onnx |
Kokoro-82M TTS | ~326 MB | -- | Apache 2.0 (hosted by kokoro-onnx) |
*EER (Equal Error Rate) from benchmark v2: 700 shared negatives (incl. adversarial confusables), 180 TTS positives, streaming inference. Lower is better. See benchmark_v2/ for full methodology and scripts.
Platform Support
| Platform | Wake Word | TTS | STT | Status |
|---|---|---|---|---|
| Windows 10/11 (x64) | ✅ | ✅ | ✅ | Fully tested |
| Linux (x64) | ✅ | ✅ | ✅ | CI-tested |
| macOS (arm64/x64) | ✅ | ✅ | ✅ | CI-tested (Intel), community (ARM) |
| Raspberry Pi 4 (ARM64) | ✅ | ⚠️ slow | ✅ | Supported |
| Browser/WASM | 🚧 | 🚧 | ❌ | Phase 2 (Q3 2026) |
| Android | ❌ | ❌ | ❌ | Phase 3 (2027) |
| iOS | ❌ | ❌ | ❌ | Phase 3 (2027) |
Installation
Minimum install (wake word + VAD only):
pip install violawake
Note: Both
import violawakeandimport violawake_sdkwork. The canonical import isviolawake_sdk(e.g.,from violawake_sdk import WakeDetector), butfrom violawake import WakeDetectoris also supported for convenience.
With microphone input and model downloading:
pip install "violawake[audio,download]"
With TTS:
pip install "violawake[tts]"
With STT:
pip install "violawake[stt]"
Full pipeline (all features):
pip install "violawake[all]"
Requirements:
- Python 3.10+
onnxruntime >= 1.17(CPU) oronnxruntime-gpufor GPU accelerationpyaudiofor microphone inputnumpy,scipyopenwakeword >= 0.6(installed automatically as a dependency — provides the frozen mel/embedding backbone)
Performance Benchmarks
Measured on i7-12700H, Windows 11, RTX 3060 (CPU inference):
| Operation | Latency (p50) | Latency (p99) |
|---|---|---|
| Wake word inference (20ms frame) | 7.8 ms | 12.1 ms |
| VAD (WebRTC, 20ms frame) | 0.4 ms | 0.8 ms |
| STT (Whisper base, 3s audio) | 680 ms | 1.2s |
| TTS first audio (Kokoro, 1 sentence) | 310 ms | 580 ms |
Wake word accuracy (benchmark v2 — TTS corpus, 700 negatives incl. adversarial confusables):
- Temporal CNN model: EER 5.49%, ROC AUC 0.9877
- FAR @ FRR=5%: 5.43% (vs OWW's 8.86% on its own best word)
- Live mic tested: 100% recall on direct speech, 0 false positives on podcast/music
- Real-world metrics depend on your deployment environment. Run
violawake-eval(clip-by-clip) orviolawake-streaming-eval(continuous FAPH) on your own test data.
Debugging
Enable debug logging to see gate rejections, backbone output, score tracking, and detection decisions:
import logging
logging.basicConfig(level=logging.DEBUG)
from violawake_sdk import WakeDetector
detector = WakeDetector(model="temporal_cnn", threshold=0.80)
This produces output like:
Gate 1 reject: RMS 0.0 below floor 1.0-- silence/DC offset filteredGate 3 reject: cooldown active (1.2s remaining)-- too soon after last detectionGate 4 reject: playback active-- suppressed during musicWake word detected! score=0.872-- successful detection
Set level=logging.INFO for detections only (less verbose).
Examples
The examples/ directory contains runnable scripts:
| File | Description |
|---|---|
examples/basic_detection.py |
Minimal microphone wake word detection loop |
examples/async_detection.py |
Async wake word detection with AsyncWakeDetector |
examples/streaming_eval.py |
Evaluate false accepts per hour on a WAV file |
Run any example with:
python examples/basic_detection.py
Comparison to openWakeWord
openWakeWord is the closest open-source alternative. ViolaWake differences:
- Open, reproducible evaluation:
violawake-evalproduces EER, FAR/FRR, ROC AUC on any model + test set.violawake-streaming-evalmeasures FAPH on continuous audio. Benchmark scripts inbenchmark_v2/— run them yourself. - Production-hardened decision policy: 4-gate pipeline (zero-input guard, score threshold, cooldown, listening gate) plus optional multi-window confirmation — suppresses false positives during music playback when
is_playingstate is wired up - Bundled pipeline: ViolaWake ships integrated VAD + STT + TTS, not just the wake word component
- Training infrastructure: FocalLoss + EMA + SWA + augmentation pipeline (gain, stretch, pitch, noise, time shift; RIR and SpecAugment available opt-in) vs basic training in openWakeWord
Migrating from openWakeWord
ViolaWake uses openWakeWord's mel-spectrogram embedding model as a frozen feature extractor backbone. If you have existing OWW training data, you can use it directly with ViolaWake's training CLI.
Key differences from OWW:
- Decision policy: ViolaWake adds a multi-gate pipeline (RMS floor, cooldown, playback suppression) on top of raw scores. OWW exposes raw sigmoid scores only.
- Temporal models: ViolaWake supports Temporal CNN and Conv-GRU heads that score across a sliding window of embeddings, not just a single frame. This reduces false positives on speech that partially matches the wake word.
- Augmentation pipeline: ViolaWake's training CLI applies gain, time stretch, pitch shift, noise mixing, and RIR convolution. SpecAugment is available for custom spectrogram-level pipelines via
AugmentationPipeline.augment_spectrogram(). OWW's default training uses minimal augmentation. - Confidence API:
detector.get_confidence()anddetector.last_scoresprovide structured confidence tracking that OWW does not offer.
Using existing OWW training data:
# Your OWW positive samples work as-is (16kHz WAV/FLAC)
violawake-train \
--word "my_wake_word" \
--positives path/to/oww_positives/ \
--negatives path/to/oww_negatives/ \
--output models/my_wake_word.onnx \
--epochs 50
No format conversion is needed -- ViolaWake reads the same 16kHz mono WAV/FLAC files that OWW uses.
Roadmap
v1.0 (Q2 2026) — Phase 1 MVP:
- Python SDK (Wake + VAD)
- Kokoro TTS integration
- faster-whisper STT integration
- Full VoicePipeline class
- Training CLI
- PyPI release
- Documentation site
v1.1 (Q3 2026) — Streaming + Web:
- Streaming STT (faster-whisper generator mode)
- WASM build for ViolaWake
- JavaScript/Node SDK wrapper
- Custom wake word web Console (alpha)
v2.0 (Q1 2027) — Multi-platform:
- Android SDK (ONNX Runtime Android)
- iOS SDK (ONNX Runtime iOS)
- DeepFilterNet noise suppression integration
- Speaker diarization (pyannote.audio)
- License/metering infrastructure
Contributing
git clone https://github.com/GeeIHadAGoodTime/ViolaWake
cd ViolaWake
pip install -e ".[dev]"
pre-commit install
pytest tests/
See CONTRIBUTING.md for guidelines.
License
Apache 2.0. Models trained on open datasets. See LICENSE for details.
ViolaWake uses OpenWakeWord as a frozen feature extractor backbone (also Apache 2.0). The classification heads (Temporal CNN, Conv-GRU) and training pipeline are original ViolaWake work.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file violawake-0.1.0.tar.gz.
File metadata
- Download URL: violawake-0.1.0.tar.gz
- Upload date:
- Size: 173.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f49ae55969d7673d4ea23e3560477889babb1a164e48b58f82fc8b4db6b15bb8
|
|
| MD5 |
ccbb7db5a5f8b9f61fe1aa32a2021ea5
|
|
| BLAKE2b-256 |
dae856f05ec00e17e57ad577c990e66e209b327d0e34ac1852af46fae479332e
|
File details
Details for the file violawake-0.1.0-py3-none-any.whl.
File metadata
- Download URL: violawake-0.1.0-py3-none-any.whl
- Upload date:
- Size: 175.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eee0f2b51fa8fa91d925b3473e733416d31f76d6d52c4e7eea3ea983db1511b2
|
|
| MD5 |
ac561a03b2d88f1d5d45a1be1714c37f
|
|
| BLAKE2b-256 |
9a7081ccb7a0e2b37678b97db732529910872d699459dbe742bf1419e8a28a6b
|