Skip to main content

Open-source wake word detection SDK with training pipeline — privacy-first, on-device, Python-native

Project description

ViolaWake SDK

The open-source alternative to Porcupine. A production-tested wake word engine with accessible training, ONNX inference, and a Python-first SDK.

CI License: Apache 2.0 Python 3.10+


Why ViolaWake?

ViolaWake Porcupine (Picovoice) openWakeWord
License Apache 2.0 Proprietary (metered) Apache 2.0
Training code open Yes No (closed) Yes
Custom wake words Yes (training CLI) Yes (paid Console) Yes (fine-tune)
Evaluation tooling violawake-eval (Cohen's d, EER, FAR/FRR, ROC AUC) None published Basic
On-device Yes (ONNX) Yes (proprietary C lib) Yes (ONNX)
Integrated TTS Yes (Kokoro-82M, optional extra) No No
Python SDK First-class C wrapper First-class
Price at scale Free Paid (free tier available) Free

Our moat: Open training code, transparent evaluation with reproducible benchmarks, production-hardened data augmentation (gain, time stretch, pitch shift, noise mixing), and a 4-gate decision policy that suppresses false positives during music playback. On a fair head-to-head benchmark against openWakeWord (same corpus, same pipeline, adversarial negatives for both systems), ViolaWake achieves EER 5.49% vs OWW's 8.24% — each system tested on its own best wake word. Running in production, not a demo.

A note on accuracy claims: Our benchmark uses TTS-generated audio with adversarial confusables, not real-speaker recordings. Real-world accuracy depends on your deployment environment. We publish our benchmark scripts so you can reproduce and extend them. Run violawake-eval on your own test data.


Quick Start

pip install "violawake[audio,download]"
violawake-download --model temporal_cnn

Wake Word Detection (5 lines)

from violawake_sdk import WakeDetector

detector = WakeDetector(model="temporal_cnn", threshold=0.80, confirm_count=3)

for audio_chunk in detector.stream_mic():  # 20ms chunks at 16kHz
    if detector.detect(audio_chunk):
        print("Wake word detected!")
        break

confirm_count=3 requires 3 consecutive above-threshold frames before firing, reducing false accepts by ~82-87% depending on threshold. Use confirm_count=1 for lowest latency.

Threshold Tuning

The threshold parameter controls the trade-off between sensitivity and false positives:

Threshold Behavior Use Case
0.70 Sensitive -- more detections, more false positives Quiet rooms, close-mic setups
0.80 Balanced (default) -- recommended starting point General-purpose, most environments
0.85 Conservative -- fewer false positives, may miss some wake words Living rooms with TV/music
0.90+ Very conservative -- lowest false positive rate Noisy environments, always-on kiosks

Start at 0.80 and adjust based on your false accept rate. Use violawake-streaming-eval to measure FAPH (false accepts per hour) on representative audio from your deployment environment, or violawake-eval for clip-by-clip EER/FAR/FRR/ROC AUC.

Text-to-Speech (Kokoro-82M)

from violawake_sdk import TTSEngine

tts = TTSEngine()  # Downloads kokoro-v1.0.onnx + voices-v1.0.bin on first run (~354MB total)
audio = tts.synthesize("Hello from ViolaWake!")
tts.play(audio)

Voice Activity Detection

from violawake_sdk import VADEngine

vad = VADEngine(backend="webrtc")  # or "silero", "rms"
prob = vad.process_frame(audio_bytes)  # returns 0.0–1.0 speech probability

Full Pipeline (Wake → STT → TTS)

Requires: pip install "violawake[audio,stt,tts]"

from violawake_sdk import VoicePipeline

pipeline = VoicePipeline(
    wake_word="viola",
    stt_model="base",        # faster-whisper model size
    tts_voice="af_heart",    # Kokoro voice
)

@pipeline.on_command
def handle_command(text: str) -> None:
    print(f"Command: {text}")
    pipeline.speak(f"You said: {text}")  # Or return a string to auto-speak

pipeline.run()  # Blocks — Ctrl+C to stop

Architecture

┌─────────────────────────────────────────────────────────────┐
│                    VoicePipeline                            │
│                                                             │
│  Mic ──► [WakeDetector] ──► [VAD] ──► [STT] ──► callback  │
│                                                             │
│  text ──► [TTS] ──► Speaker                                │
└─────────────────────────────────────────────────────────────┘

Components:

Module Engine Size Latency
Wake word Temporal CNN on OWW embeddings (ONNX) ~100 KB head (+OWW backbone via openwakeword) ~8ms/frame
VAD WebRTC VAD / Silero / RMS heuristic <1 MB <1ms/frame
STT faster-whisper base 145 MB 0.5–2s
TTS Kokoro-82M (ONNX) 326 MB 0.3–0.8s/sentence

Training Your Own Wake Word

The training CLI lets you train a custom wake word model with ~200 positive samples:

# Collect positive samples (read prompts aloud)
violawake-collect --word "jarvis" --output data/jarvis/positives/ --count 200

# Train (auto-generates TTS positives, confusable negatives, and speech negatives)
violawake-train \
  --word "jarvis" \
  --positives data/jarvis/positives/ \
  --output models/jarvis.onnx \
  --epochs 50

# To disable augmentation, add --no-augment
# To use legacy MLP architecture, add --architecture mlp

# Evaluate (EER, FAR/FRR, ROC AUC)
violawake-eval \
  --model models/jarvis.onnx \
  --test-dir data/jarvis/test/ \
  --report

The --test-dir must contain positives/ and negatives/ subdirectories.

Expected results: EER < 10% (against the bundled synthetic negative corpus) with 200+ quality positive samples. Your real-world performance will depend on your deployment environment and negative speech corpus.

Proof: "Operator" Custom Wake Word (89 seconds, EER 7.2%)

To prove the training pipeline generalizes beyond "Viola," we trained a custom "operator" model from scratch — zero manual data collection:

ViolaWake "viola" ViolaWake "operator" OWW "alexa" (pre-trained)
EER 5.49% 7.2% 8.24%
ROC AUC 0.988 0.984 0.956
Training time ~48s 89s N/A (pre-trained)
Architecture Temporal CNN Temporal CNN MLP on OWW embeddings

The training CLI handled TTS sample generation (20 Edge TTS voices), confusable negative generation (16 phonetic variants), 10x augmentation, and Temporal CNN training end-to-end. OWW provides training notebooks but no pip-installable CLI tool.

Full methodology, corpus details, and reproducibility instructions: benchmark_v2/OPERATOR_BENCHMARK.md


Models

Models are versioned and published to GitHub Releases. Use registry names without file extensions when passing --model or WakeDetector(model=...). Download separately (too large for PyPI):

python -m violawake_sdk.tools.download_model --model temporal_cnn   # default, ~100 KB
python -m violawake_sdk.tools.download_model --model kokoro_v1_0    # TTS model, 326 MB
python -m violawake_sdk.tools.download_model --model kokoro_voices_v1_0  # TTS voices, 28 MB
Model Type Size EER* Notes
temporal_cnn.onnx Temporal CNN on OWW embeddings ~100 KB 5.49% Production default — best live recall + lowest FP
temporal_convgru.onnx Temporal Conv-GRU on OWW embeddings ~81 KB -- Reserve model
r3_10x_s42.onnx MLP on OWW embeddings ~34 KB -- Deprecated — fails live mic test. Do not use.
kokoro-v1.0.onnx Kokoro-82M TTS ~326 MB -- Apache 2.0 (hosted by kokoro-onnx)

*EER (Equal Error Rate) from benchmark v2: 700 shared negatives (incl. adversarial confusables), 180 TTS positives, streaming inference. Lower is better. See benchmark_v2/ for full methodology and scripts.


Platform Support

Platform Wake Word TTS STT Status
Windows 10/11 (x64) Fully tested
Linux (x64) CI-tested
macOS (arm64/x64) CI-tested (Intel), community (ARM)
Raspberry Pi 4 (ARM64) ⚠️ slow Supported
Browser/WASM 🚧 🚧 Phase 2 (Q3 2026)
Android Phase 3 (2027)
iOS Phase 3 (2027)

Installation

Minimum install (wake word + VAD only):

pip install violawake

Note: Both import violawake and import violawake_sdk work. The canonical import is violawake_sdk (e.g., from violawake_sdk import WakeDetector), but from violawake import WakeDetector is also supported for convenience.

With microphone input and model downloading:

pip install "violawake[audio,download]"

With TTS:

pip install "violawake[tts]"

With STT:

pip install "violawake[stt]"

Full pipeline (all features):

pip install "violawake[all]"

Requirements:

  • Python 3.10+
  • onnxruntime >= 1.17 (CPU) or onnxruntime-gpu for GPU acceleration
  • pyaudio for microphone input
  • numpy, scipy
  • openwakeword >= 0.6 (installed automatically as a dependency — provides the frozen mel/embedding backbone)

Performance Benchmarks

Measured on i7-12700H, Windows 11, RTX 3060 (CPU inference):

Operation Latency (p50) Latency (p99)
Wake word inference (20ms frame) 7.8 ms 12.1 ms
VAD (WebRTC, 20ms frame) 0.4 ms 0.8 ms
STT (Whisper base, 3s audio) 680 ms 1.2s
TTS first audio (Kokoro, 1 sentence) 310 ms 580 ms

Wake word accuracy (benchmark v2 — TTS corpus, 700 negatives incl. adversarial confusables):

  • Temporal CNN model: EER 5.49%, ROC AUC 0.9877
  • FAR @ FRR=5%: 5.43% (vs OWW's 8.86% on its own best word)
  • Live mic tested: 100% recall on direct speech, 0 false positives on podcast/music
  • Real-world metrics depend on your deployment environment. Run violawake-eval (clip-by-clip) or violawake-streaming-eval (continuous FAPH) on your own test data.

Debugging

Enable debug logging to see gate rejections, backbone output, score tracking, and detection decisions:

import logging
logging.basicConfig(level=logging.DEBUG)

from violawake_sdk import WakeDetector
detector = WakeDetector(model="temporal_cnn", threshold=0.80)

This produces output like:

  • Gate 1 reject: RMS 0.0 below floor 1.0 -- silence/DC offset filtered
  • Gate 3 reject: cooldown active (1.2s remaining) -- too soon after last detection
  • Gate 4 reject: playback active -- suppressed during music
  • Wake word detected! score=0.872 -- successful detection

Set level=logging.INFO for detections only (less verbose).


Examples

The examples/ directory contains runnable scripts:

File Description
examples/basic_detection.py Minimal microphone wake word detection loop
examples/async_detection.py Async wake word detection with AsyncWakeDetector
examples/streaming_eval.py Evaluate false accepts per hour on a WAV file

Run any example with:

python examples/basic_detection.py

Comparison to openWakeWord

openWakeWord is the closest open-source alternative. ViolaWake differences:

  • Open, reproducible evaluation: violawake-eval produces EER, FAR/FRR, ROC AUC on any model + test set. violawake-streaming-eval measures FAPH on continuous audio. Benchmark scripts in benchmark_v2/ — run them yourself.
  • Production-hardened decision policy: 4-gate pipeline (zero-input guard, score threshold, cooldown, listening gate) plus optional multi-window confirmation — suppresses false positives during music playback when is_playing state is wired up
  • Bundled pipeline: ViolaWake ships integrated VAD + STT + TTS, not just the wake word component
  • Training infrastructure: FocalLoss + EMA + SWA + augmentation pipeline (gain, stretch, pitch, noise, time shift; RIR and SpecAugment available opt-in) vs basic training in openWakeWord

Migrating from openWakeWord

ViolaWake uses openWakeWord's mel-spectrogram embedding model as a frozen feature extractor backbone. If you have existing OWW training data, you can use it directly with ViolaWake's training CLI.

Key differences from OWW:

  • Decision policy: ViolaWake adds a multi-gate pipeline (RMS floor, cooldown, playback suppression) on top of raw scores. OWW exposes raw sigmoid scores only.
  • Temporal models: ViolaWake supports Temporal CNN and Conv-GRU heads that score across a sliding window of embeddings, not just a single frame. This reduces false positives on speech that partially matches the wake word.
  • Augmentation pipeline: ViolaWake's training CLI applies gain, time stretch, pitch shift, noise mixing, and RIR convolution. SpecAugment is available for custom spectrogram-level pipelines via AugmentationPipeline.augment_spectrogram(). OWW's default training uses minimal augmentation.
  • Confidence API: detector.get_confidence() and detector.last_scores provide structured confidence tracking that OWW does not offer.

Using existing OWW training data:

# Your OWW positive samples work as-is (16kHz WAV/FLAC)
violawake-train \
  --word "my_wake_word" \
  --positives path/to/oww_positives/ \
  --negatives path/to/oww_negatives/ \
  --output models/my_wake_word.onnx \
  --epochs 50

No format conversion is needed -- ViolaWake reads the same 16kHz mono WAV/FLAC files that OWW uses.


Roadmap

v1.0 (Q2 2026) — Phase 1 MVP:

  • Python SDK (Wake + VAD)
  • Kokoro TTS integration
  • faster-whisper STT integration
  • Full VoicePipeline class
  • Training CLI
  • PyPI release
  • Documentation site

v1.1 (Q3 2026) — Streaming + Web:

  • Streaming STT (faster-whisper generator mode)
  • WASM build for ViolaWake
  • JavaScript/Node SDK wrapper
  • Custom wake word web Console (alpha)

v2.0 (Q1 2027) — Multi-platform:

  • Android SDK (ONNX Runtime Android)
  • iOS SDK (ONNX Runtime iOS)
  • DeepFilterNet noise suppression integration
  • Speaker diarization (pyannote.audio)
  • License/metering infrastructure

Contributing

git clone https://github.com/GeeIHadAGoodTime/ViolaWake
cd ViolaWake
pip install -e ".[dev]"
pre-commit install
pytest tests/

See CONTRIBUTING.md for guidelines.


License

Apache 2.0. Models trained on open datasets. See LICENSE for details.

ViolaWake uses OpenWakeWord as a frozen feature extractor backbone (also Apache 2.0). The classification heads (Temporal CNN, Conv-GRU) and training pipeline are original ViolaWake work.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

violawake-0.1.0.tar.gz (173.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

violawake-0.1.0-py3-none-any.whl (175.2 kB view details)

Uploaded Python 3

File details

Details for the file violawake-0.1.0.tar.gz.

File metadata

  • Download URL: violawake-0.1.0.tar.gz
  • Upload date:
  • Size: 173.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for violawake-0.1.0.tar.gz
Algorithm Hash digest
SHA256 f49ae55969d7673d4ea23e3560477889babb1a164e48b58f82fc8b4db6b15bb8
MD5 ccbb7db5a5f8b9f61fe1aa32a2021ea5
BLAKE2b-256 dae856f05ec00e17e57ad577c990e66e209b327d0e34ac1852af46fae479332e

See more details on using hashes here.

File details

Details for the file violawake-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: violawake-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 175.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for violawake-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 eee0f2b51fa8fa91d925b3473e733416d31f76d6d52c4e7eea3ea983db1511b2
MD5 ac561a03b2d88f1d5d45a1be1714c37f
BLAKE2b-256 9a7081ccb7a0e2b37678b97db732529910872d699459dbe742bf1419e8a28a6b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page