Open-source wake word detection SDK with training pipeline — privacy-first, on-device, Python-native

These details have not been verified by PyPI

Project links

Project description

ViolaWake SDK

The open-source alternative to Porcupine. A production-tested wake word engine with accessible training, ONNX inference, and a Python-first SDK.

Why ViolaWake?

	ViolaWake	Porcupine (Picovoice)	openWakeWord
License	Apache 2.0	Proprietary (metered)	Apache 2.0
Training code open	Yes	No (closed)	Yes
Custom wake words	Yes (training CLI)	Yes (paid Console)	Yes (fine-tune)
Evaluation tooling	`violawake-eval` (Cohen's d, EER, FAR/FRR, ROC AUC)	None published	Basic
On-device	Yes (ONNX)	Yes (proprietary C lib)	Yes (ONNX)
Integrated TTS	Yes (Kokoro-82M, optional extra)	No	No
Python SDK	First-class	C wrapper	First-class
Price at scale	Free	Paid (free tier available)	Free

Our moat: Open training code, transparent evaluation with reproducible benchmarks, production-hardened data augmentation (gain, time stretch, pitch shift, noise mixing), and a 4-gate decision policy that suppresses false positives during music playback. On a fair head-to-head benchmark against openWakeWord (same corpus, same pipeline, adversarial negatives for both systems), ViolaWake achieves EER 5.49% vs OWW's 8.24% — each system tested on its own best wake word. Running in production, not a demo.

A note on accuracy claims: Our benchmark uses TTS-generated audio with adversarial confusables, not real-speaker recordings. Real-world accuracy depends on your deployment environment. We publish our benchmark scripts so you can reproduce and extend them. Run violawake-eval on your own test data.

Quick Start

pip install "violawake[audio,download]"
violawake-download --model temporal_cnn

Wake Word Detection (5 lines)

from violawake_sdk import WakeDetector

detector = WakeDetector(model="temporal_cnn", threshold=0.80, confirm_count=3)

for audio_chunk in detector.stream_mic():  # 20ms chunks at 16kHz
    if detector.detect(audio_chunk):
        print("Wake word detected!")
        break

confirm_count=3 requires 3 consecutive above-threshold frames before firing, reducing false accepts by ~82-87% depending on threshold. Use confirm_count=1 for lowest latency.

Threshold Tuning

The threshold parameter controls the trade-off between sensitivity and false positives:

Threshold	Behavior	Use Case
0.70	Sensitive -- more detections, more false positives	Quiet rooms, close-mic setups
0.80	Balanced (default) -- recommended starting point	General-purpose, most environments
0.85	Conservative -- fewer false positives, may miss some wake words	Living rooms with TV/music
0.90+	Very conservative -- lowest false positive rate	Noisy environments, always-on kiosks

Start at 0.80 and adjust based on your false accept rate. Use violawake-streaming-eval to measure FAPH (false accepts per hour) on representative audio from your deployment environment, or violawake-eval for clip-by-clip EER/FAR/FRR/ROC AUC.

Text-to-Speech (Kokoro-82M)

from violawake_sdk import TTSEngine

tts = TTSEngine()  # Downloads kokoro-v1.0.onnx + voices-v1.0.bin on first run (~354MB total)
audio = tts.synthesize("Hello from ViolaWake!")
tts.play(audio)

Voice Activity Detection

from violawake_sdk import VADEngine

vad = VADEngine(backend="webrtc")  # or "silero", "rms"
prob = vad.process_frame(audio_bytes)  # returns 0.0–1.0 speech probability

Full Pipeline (Wake → STT → TTS)

Requires: pip install "violawake[audio,stt,tts]"

from violawake_sdk import VoicePipeline

pipeline = VoicePipeline(
    wake_word="viola",
    stt_model="base",        # faster-whisper model size
    tts_voice="af_heart",    # Kokoro voice
)

@pipeline.on_command
def handle_command(text: str) -> None:
    print(f"Command: {text}")
    pipeline.speak(f"You said: {text}")  # Or return a string to auto-speak

pipeline.run()  # Blocks — Ctrl+C to stop

Architecture

┌─────────────────────────────────────────────────────────────┐
│                    VoicePipeline                            │
│                                                             │
│  Mic ──► [WakeDetector] ──► [VAD] ──► [STT] ──► callback  │
│                                                             │
│  text ──► [TTS] ──► Speaker                                │
└─────────────────────────────────────────────────────────────┘

Components:

Module	Engine	Size	Latency
Wake word	Temporal CNN on OWW embeddings (ONNX)	~100 KB head (+OWW backbone via `openwakeword`)	~8ms/frame
VAD	WebRTC VAD / Silero / RMS heuristic	<1 MB	<1ms/frame
STT	faster-whisper `base`	145 MB	0.5–2s
TTS	Kokoro-82M (ONNX)	326 MB	0.3–0.8s/sentence

Training Your Own Wake Word

The training CLI lets you train a custom wake word model with ~200 positive samples:

# Collect positive samples (read prompts aloud)
violawake-collect --word "jarvis" --output data/jarvis/positives/ --count 200

# Train (auto-generates TTS positives, confusable negatives, and speech negatives)
violawake-train \
  --word "jarvis" \
  --positives data/jarvis/positives/ \
  --output models/jarvis.onnx \
  --epochs 50

# To disable augmentation, add --no-augment
# To use legacy MLP architecture, add --architecture mlp

# Evaluate (EER, FAR/FRR, ROC AUC)
violawake-eval \
  --model models/jarvis.onnx \
  --test-dir data/jarvis/test/ \
  --report

The --test-dir must contain positives/ and negatives/ subdirectories.

Expected results: EER < 10% (against the bundled synthetic negative corpus) with 200+ quality positive samples. Your real-world performance will depend on your deployment environment and negative speech corpus.

Proof: "Operator" Custom Wake Word (89 seconds, EER 7.2%)

To prove the training pipeline generalizes beyond "Viola," we trained a custom "operator" model from scratch — zero manual data collection:

	ViolaWake "viola"	ViolaWake "operator"	OWW "alexa" (pre-trained)
EER	5.49%	7.2%	8.24%
ROC AUC	0.988	0.984	0.956
Training time	~48s	89s	N/A (pre-trained)
Architecture	Temporal CNN	Temporal CNN	MLP on OWW embeddings

The training CLI handled TTS sample generation (20 Edge TTS voices), confusable negative generation (16 phonetic variants), 10x augmentation, and Temporal CNN training end-to-end. OWW provides training notebooks but no pip-installable CLI tool.

Full methodology, corpus details, and reproducibility instructions: benchmark_v2/OPERATOR_BENCHMARK.md

Models

Models are versioned and published to GitHub Releases. Use registry names without file extensions when passing --model or WakeDetector(model=...). Download separately (too large for PyPI):

python -m violawake_sdk.tools.download_model --model temporal_cnn   # default, ~100 KB
python -m violawake_sdk.tools.download_model --model kokoro_v1_0    # TTS model, 326 MB
python -m violawake_sdk.tools.download_model --model kokoro_voices_v1_0  # TTS voices, 28 MB

Model	Type	Size	EER*	Notes
`temporal_cnn.onnx`	Temporal CNN on OWW embeddings	~100 KB	5.49%	Production default — best live recall + lowest FP
`temporal_convgru.onnx`	Temporal Conv-GRU on OWW embeddings	~81 KB	--	Reserve model
~~`r3_10x_s42.onnx`~~	MLP on OWW embeddings	~34 KB	--	Deprecated — fails live mic test. Do not use.
`kokoro-v1.0.onnx`	Kokoro-82M TTS	~326 MB	--	Apache 2.0 (hosted by kokoro-onnx)

*EER (Equal Error Rate) from benchmark v2: 700 shared negatives (incl. adversarial confusables), 180 TTS positives, streaming inference. Lower is better. See benchmark_v2/ for full methodology and scripts.

Platform Support

Platform	Wake Word	TTS	STT	Status
Windows 10/11 (x64)	✅	✅	✅	Fully tested
Linux (x64)	✅	✅	✅	CI-tested
macOS (arm64/x64)	✅	✅	✅	CI-tested (Intel), community (ARM)
Raspberry Pi 4 (ARM64)	✅	⚠️ slow	✅	Supported
Browser/WASM	🚧	🚧	❌	Phase 2 (Q3 2026)
Android	❌	❌	❌	Phase 3 (2027)
iOS	❌	❌	❌	Phase 3 (2027)

Installation

Minimum install (wake word + VAD only):

pip install violawake

Note: Both import violawake and import violawake_sdk work. The canonical import is violawake_sdk (e.g., from violawake_sdk import WakeDetector), but from violawake import WakeDetector is also supported for convenience.

With microphone input and model downloading:

pip install "violawake[audio,download]"

With TTS:

pip install "violawake[tts]"

With STT:

pip install "violawake[stt]"

Full pipeline (all features):

pip install "violawake[all]"

Requirements:

Python 3.10+
onnxruntime >= 1.17 (CPU) or onnxruntime-gpu for GPU acceleration
pyaudio for microphone input
numpy, scipy
openwakeword >= 0.6 (installed automatically as a dependency — provides the frozen mel/embedding backbone)

Performance Benchmarks

Measured on i7-12700H, Windows 11, RTX 3060 (CPU inference):

Operation	Latency (p50)	Latency (p99)
Wake word inference (20ms frame)	7.8 ms	12.1 ms
VAD (WebRTC, 20ms frame)	0.4 ms	0.8 ms
STT (Whisper base, 3s audio)	680 ms	1.2s
TTS first audio (Kokoro, 1 sentence)	310 ms	580 ms

Wake word accuracy (benchmark v2 — TTS corpus, 700 negatives incl. adversarial confusables):

Temporal CNN model: EER 5.49%, ROC AUC 0.9877
FAR @ FRR=5%: 5.43% (vs OWW's 8.86% on its own best word)
Live mic tested: 100% recall on direct speech, 0 false positives on podcast/music
Real-world metrics depend on your deployment environment. Run violawake-eval (clip-by-clip) or violawake-streaming-eval (continuous FAPH) on your own test data.

Debugging

Enable debug logging to see gate rejections, backbone output, score tracking, and detection decisions:

import logging
logging.basicConfig(level=logging.DEBUG)

from violawake_sdk import WakeDetector
detector = WakeDetector(model="temporal_cnn", threshold=0.80)

This produces output like:

Gate 1 reject: RMS 0.0 below floor 1.0 -- silence/DC offset filtered
Gate 3 reject: cooldown active (1.2s remaining) -- too soon after last detection
Gate 4 reject: playback active -- suppressed during music
Wake word detected! score=0.872 -- successful detection

Set level=logging.INFO for detections only (less verbose).

Examples

The examples/ directory contains runnable scripts:

File	Description
`examples/basic_detection.py`	Minimal microphone wake word detection loop
`examples/async_detection.py`	Async wake word detection with AsyncWakeDetector
`examples/streaming_eval.py`	Evaluate false accepts per hour on a WAV file

Run any example with:

python examples/basic_detection.py

Comparison to openWakeWord

openWakeWord is the closest open-source alternative. ViolaWake differences:

Open, reproducible evaluation: violawake-eval produces EER, FAR/FRR, ROC AUC on any model + test set. violawake-streaming-eval measures FAPH on continuous audio. Benchmark scripts in benchmark_v2/ — run them yourself.
Production-hardened decision policy: 4-gate pipeline (zero-input guard, score threshold, cooldown, listening gate) plus optional multi-window confirmation — suppresses false positives during music playback when is_playing state is wired up
Bundled pipeline: ViolaWake ships integrated VAD + STT + TTS, not just the wake word component
Training infrastructure: FocalLoss + EMA + SWA + augmentation pipeline (gain, stretch, pitch, noise, time shift; RIR and SpecAugment available opt-in) vs basic training in openWakeWord

Migrating from openWakeWord

ViolaWake uses openWakeWord's mel-spectrogram embedding model as a frozen feature extractor backbone. If you have existing OWW training data, you can use it directly with ViolaWake's training CLI.

Key differences from OWW:

Decision policy: ViolaWake adds a multi-gate pipeline (RMS floor, cooldown, playback suppression) on top of raw scores. OWW exposes raw sigmoid scores only.
Temporal models: ViolaWake supports Temporal CNN and Conv-GRU heads that score across a sliding window of embeddings, not just a single frame. This reduces false positives on speech that partially matches the wake word.
Augmentation pipeline: ViolaWake's training CLI applies gain, time stretch, pitch shift, noise mixing, and RIR convolution. SpecAugment is available for custom spectrogram-level pipelines via AugmentationPipeline.augment_spectrogram(). OWW's default training uses minimal augmentation.
Confidence API: detector.get_confidence() and detector.last_scores provide structured confidence tracking that OWW does not offer.

Using existing OWW training data:

# Your OWW positive samples work as-is (16kHz WAV/FLAC)
violawake-train \
  --word "my_wake_word" \
  --positives path/to/oww_positives/ \
  --negatives path/to/oww_negatives/ \
  --output models/my_wake_word.onnx \
  --epochs 50

No format conversion is needed -- ViolaWake reads the same 16kHz mono WAV/FLAC files that OWW uses.

Roadmap

v1.0 (Q2 2026) — Phase 1 MVP:

Python SDK (Wake + VAD)
Kokoro TTS integration
faster-whisper STT integration
Full VoicePipeline class
Training CLI
PyPI release
Documentation site

v1.1 (Q3 2026) — Streaming + Web:

Streaming STT (faster-whisper generator mode)
WASM build for ViolaWake
JavaScript/Node SDK wrapper
Custom wake word web Console (alpha)

v2.0 (Q1 2027) — Multi-platform:

Android SDK (ONNX Runtime Android)
iOS SDK (ONNX Runtime iOS)
DeepFilterNet noise suppression integration
Speaker diarization (pyannote.audio)
License/metering infrastructure

Contributing

git clone https://github.com/GeeIHadAGoodTime/ViolaWake
cd ViolaWake
pip install -e ".[dev]"
pre-commit install
pytest tests/

See CONTRIBUTING.md for guidelines.

License

Apache 2.0. Models trained on open datasets. See LICENSE for details.

ViolaWake uses OpenWakeWord as a frozen feature extractor backbone (also Apache 2.0). The classification heads (Temporal CNN, Conv-GRU) and training pipeline are original ViolaWake work.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.6

May 8, 2026

0.2.5

May 8, 2026

0.2.4

May 7, 2026

0.2.2

Mar 28, 2026

0.2.1

Mar 28, 2026

0.2.0

Mar 28, 2026

This version

0.1.0

Mar 28, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

violawake-0.1.0.tar.gz (173.9 kB view details)

Uploaded Mar 28, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

violawake-0.1.0-py3-none-any.whl (175.2 kB view details)

Uploaded Mar 28, 2026 Python 3

File details

Details for the file violawake-0.1.0.tar.gz.

File metadata

Download URL: violawake-0.1.0.tar.gz
Upload date: Mar 28, 2026
Size: 173.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for violawake-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`f49ae55969d7673d4ea23e3560477889babb1a164e48b58f82fc8b4db6b15bb8`
MD5	`ccbb7db5a5f8b9f61fe1aa32a2021ea5`
BLAKE2b-256	`dae856f05ec00e17e57ad577c990e66e209b327d0e34ac1852af46fae479332e`

See more details on using hashes here.

File details

Details for the file violawake-0.1.0-py3-none-any.whl.

File metadata

Download URL: violawake-0.1.0-py3-none-any.whl
Upload date: Mar 28, 2026
Size: 175.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for violawake-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`eee0f2b51fa8fa91d925b3473e733416d31f76d6d52c4e7eea3ea983db1511b2`
MD5	`ac561a03b2d88f1d5d45a1be1714c37f`
BLAKE2b-256	`9a7081ccb7a0e2b37678b97db732529910872d699459dbe742bf1419e8a28a6b`

See more details on using hashes here.

violawake 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ViolaWake SDK

Why ViolaWake?

Quick Start

Wake Word Detection (5 lines)

Threshold Tuning

Text-to-Speech (Kokoro-82M)

Voice Activity Detection

Full Pipeline (Wake → STT → TTS)

Architecture

Training Your Own Wake Word

Proof: "Operator" Custom Wake Word (89 seconds, EER 7.2%)

Models

Platform Support

Installation

Performance Benchmarks

Debugging

Examples

Comparison to openWakeWord

Migrating from openWakeWord

Roadmap

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes