Wake word detection for voice-enabled applications

These details have been verified by PyPI

Project links

Owner

LiveKit

GitHub Statistics

These details have not been verified by PyPI

Project description

livekit-wakeword

An open-source wake word library for creating voice-enabled applications. Based on openWakeWord with streamlined training — generate synthetic data, augment, train, and export from a single YAML config.

Features:

Conv-Attention classifier — 1D temporal convolutions + multi-head self-attention replace openWakeWord's flat DNN head, preserving temporal structure across the 16-frame embedding window for better accuracy and fewer false positives (see comparison below)
Backward compatible with openWakeWord models and library
Train anywhere — local machine, cloud, or spawn SkyPilot jobs
Zero dependency headaches — uv handles everything

Quick Links:

Using Existing Models
Training New Models Using The CLI
Training New Models Using The Python API
openWakeWord vs livekit-wakeword

Quick Start

Using Existing Models and Library

System dependencies (for microphone listener):

# macOS
brew install portaudio

# Ubuntu/Debian
sudo apt install portaudio19-dev

Installation:

pip install git+https://github.com/livekit/livekit-wakeword.git
# or
uv add git+https://github.com/livekit/livekit-wakeword

Basic inference:

from livekit.wakeword import WakeWordModel

model = WakeWordModel(models=["hey_livekit.onnx"])

# Feed audio frames (16kHz, int16 or float32)
scores = model.predict(audio_frame)
if scores["hey_livekit"] > 0.5:
    print("Wake word detected!")

Async listener with microphone:

import asyncio
from livekit.wakeword import WakeWordModel, WakeWordListener

model = WakeWordModel(models=["hey_livekit.onnx"])

async def main():
    async with WakeWordListener(model, threshold=0.5, debounce=2.0) as listener:
        while True:
            detection = await listener.wait_for_detection()
            print(f"Detected {detection.name}! ({detection.confidence:.2f})")

asyncio.run(main())

Training New Models Using The CLI

System dependencies:

# macOS
brew install espeak-ng ffmpeg portaudio

# Ubuntu/Debian
sudo apt install espeak-ng libsndfile1 ffmpeg sox portaudio19-dev

Installation:

# Install uv (if you don't have it)
curl -LsSf https://astral.sh/uv/install.sh | sh

# Clone and install
git clone https://github.com/livekit/livekit-wakeword
cd livekit-wakeword
uv sync --all-extras

Download models and data:

uv run livekit-wakeword setup

Train a wake word:

uv run livekit-wakeword run configs/hey_livekit.yaml

Or run stages individually:

uv run livekit-wakeword generate configs/hey_livekit.yaml  # TTS synthesis + adversarial negatives
uv run livekit-wakeword augment configs/hey_livekit.yaml   # Augment + extract features
uv run livekit-wakeword train configs/hey_livekit.yaml     # 3-phase adaptive training
uv run livekit-wakeword export configs/hey_livekit.yaml    # Export to ONNX
uv run livekit-wakeword eval configs/hey_livekit.yaml      # Evaluate model (DET curve, AUT, FPPH)

You can also evaluate any compatible ONNX model (e.g., one trained with openWakeWord):

uv run livekit-wakeword eval configs/hey_livekit.yaml -m /path/to/other_model.onnx

Eval produces a DET curve plot and metrics JSON in the output directory. See Evaluation for details.

Config:

See configs/hey_livekit.yaml for all options.

model_name: hey_livekit
target_phrases:
  - "hey livekit"

n_samples: 10000 # training samples per class
model:
  model_type: conv_attention # conv_attention, dnn, or rnn
  model_size: small # tiny, small, medium, large
steps: 50000
target_fp_per_hour: 0.2

Train on cloud GPUs with SkyPilot:

See skypilot/train.yaml for SkyPilot's example training job on Nebius.

sky launch skypilot/train.yaml

Training New Models Using The Python API

The full training pipeline is available as a Python API, so you can import and drive it from your own code instead of using the CLI:

from livekit.wakeword import (
    WakeWordConfig,
    load_config,
    run_generate,
    run_augment,
    run_extraction,
    run_train,
    run_export,
    run_eval,
)

# Load from YAML or construct directly
config = load_config("configs/hey_livekit.yaml")

# Or build a config programmatically
config = WakeWordConfig(
    model_name="hey_robot",
    target_phrases=["hey robot"],
    n_samples=5000,
    steps=30000,
)

# Run individual stages
run_generate(config)     # TTS synthesis + adversarial negatives
run_augment(config)      # Add noise, reverb, pitch shifts
run_extraction(config)   # Extract mel spectrograms + speech embeddings → .npy
run_train(config)        # 3-phase adaptive training
onnx_path = run_export(config)       # Export to ONNX

# Evaluate the exported model
results = run_eval(config, onnx_path)
print(f"AUT={results['aut']:.4f}  FPPH={results['fpph']:.2f}  Recall={results['recall']:.1%}")

This is useful for integrating wake word training into larger pipelines, automating model iteration, or building custom tooling on top of the data generation and training stages.

openWakeWord vs livekit-wakeword

Both libraries share the same audio front-end: mel spectrograms are fed through frozen Google speech embedding and openWakeWord embedding models to produce a (16, 96) feature matrix (16 timesteps × 96-dim embeddings). The difference is the classification head that sits on top.

Architecture

openWakeWord flattens the (16, 96) matrix into a 1536-d vector and feeds it through a small fully-connected DNN:

Flatten(16×96=1536) → Dense → Dense → Sigmoid

While the positional information is technically still present in the flattened vector, the dense layer has no inductive bias for temporal structure and must learn any sequential patterns from scratch.

livekit-wakeword introduces a Conv-Attention (conv_attention) classifier:

Conv1D blocks → MultiheadAttention → Mean pool → Linear(1) → Sigmoid

1D Convolutions (kernel size 3) slide across the 16 timesteps, capturing local temporal patterns (e.g., syllable transitions).
Multi-Head Self-Attention models long-range dependencies across the full temporal window, letting the model learn which timestep relationships matter.
Mean pooling aggregates attended features into a fixed-size vector for the final sigmoid output.

Results

To compare, we evaluated an openWakeWord DNN, a livekit-wakeword DNN (same architecture, better training pipeline), and a livekit-wakeword conv-attention model on the same "hey livekit" validation set (15,000 positive clips, 45,084 negative clips, 25 hours of audio). The livekit-wakeword models were trained with the prod config.

Metric	openWakeWord (DNN)	livekit-wakeword (DNN)	livekit-wakeword (conv-attention)
AUT*	0.0720	0.0423	0.0012
FPPH*	8.50	3.07	0.08
Recall*	68.6%	85.3%	86.1%
Optimal Threshold*	0.01	0.01	0.68

openWakeWord (DNN)	livekit-wakeword (DNN)	livekit-wakeword (conv-attention)

The livekit-wakeword DNN already outperforms openWakeWord's DNN thanks to the improved training pipeline (focal loss, embedding mixup, 3-phase training, checkpoint averaging). However, both DNN models fail to meet the FPPH target — their optimal thresholds fall to 0.01, meaning no operating point can keep false positives low enough.

The conv-attention head is what unlocks the low false positive rate: 60x lower AUT and 100x fewer false positives per hour than openWakeWord, while detecting 17% more wake words.

*AUT (Area Under the DET curve) — summarizes the full DET (Detection Error Tradeoff) curve, which plots false positive rate vs false negative rate across all thresholds. Lower is better (0 = perfect). A DET curve that hugs the bottom-left corner indicates strong separation between wake words and non-wake-words.

*FPPH (False Positives Per Hour) — how many times the model falsely triggers per hour of non-wake-word audio. Lower is better. For production use, < 0.5 FPPH is typical.

*Recall — the percentage of actual wake words correctly detected. Higher is better.

*Optimal Threshold — the detection threshold that maximizes recall while keeping FPPH at or below the target (configurable, default 0.1). A threshold of 0.01 indicates no threshold could meet the FPPH target — the evaluator fell back to the highest balanced accuracy.

Why conv-attention wins

Temporal awareness — the conv-attention model sees the order of speech events, not just their presence, reducing false triggers from phonetically similar but differently ordered phrases.
Better accuracy at the same model size — attention lets a small model selectively focus on discriminative time regions rather than learning dense connections over the full flattened input.
Lower false-positive rates — temporal structure helps reject partial or reordered matches that a flat DNN would accept.

The conv-attention head is the default. You can switch to the original DNN or an RNN head via model_type in your config:

model:
  model_type: conv_attention  # conv_attention (default) | dnn | rnn
  model_size: small           # tiny, small, medium, large

Detailed Documentation

If you want to understand more about how this library works:

Architecture Overview — system design and data flow
Data Generation — TTS synthesis and adversarial negatives
Augmentation — audio transforms and alignment
Feature Extraction — mel spectrograms and embeddings
Training — 3-phase training and checkpoint averaging
Export & Inference — ONNX export and Python API
Evaluation — DET curves, AUT, and model comparison

License

This project is licensed under the Apache License 2.0 — see the LICENSE file for details.

Project details

These details have been verified by PyPI

Project links

Owner

LiveKit

GitHub Statistics

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.2.0

Apr 17, 2026

0.1.3

Apr 6, 2026

0.1.2

Apr 4, 2026

0.1.1

Mar 16, 2026

This version

0.1.0

Mar 16, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

livekit_wakeword-0.1.0.tar.gz (1.9 MB view details)

Uploaded Mar 16, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

livekit_wakeword-0.1.0-py3-none-any.whl (1.9 MB view details)

Uploaded Mar 16, 2026 Python 3

File details

Details for the file livekit_wakeword-0.1.0.tar.gz.

File metadata

Download URL: livekit_wakeword-0.1.0.tar.gz
Upload date: Mar 16, 2026
Size: 1.9 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for livekit_wakeword-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`59ce907f08d0867a7ced8e5f6a2154d3a77aa47974effd0bff72d3a9f3923f94`
MD5	`01f9ed97acd3f61f8a4f916d44eadd6b`
BLAKE2b-256	`ab0262eea9cfd47a8a500d73a45e4975e60ae5a226310a32028e4a1f10c9d2dd`

See more details on using hashes here.

Provenance

The following attestation bundles were made for livekit_wakeword-0.1.0.tar.gz:

Publisher: publish.yml on livekit/livekit-wakeword

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: livekit_wakeword-0.1.0.tar.gz
- Subject digest: 59ce907f08d0867a7ced8e5f6a2154d3a77aa47974effd0bff72d3a9f3923f94
- Sigstore transparency entry: 1109295465
- Sigstore integration time: Mar 16, 2026
Source repository:
- Permalink: livekit/livekit-wakeword@f0699df0d6216be7939ca3dcb0ed209c1218cffe
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/livekit
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@f0699df0d6216be7939ca3dcb0ed209c1218cffe
- Trigger Event: push

File details

Details for the file livekit_wakeword-0.1.0-py3-none-any.whl.

File metadata

Download URL: livekit_wakeword-0.1.0-py3-none-any.whl
Upload date: Mar 16, 2026
Size: 1.9 MB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for livekit_wakeword-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e5968e8b65d941155d3f922ec15fded3f38628b21199b0c9d6ea56b8a947db62`
MD5	`7966c7f1f68d6546c3e006217f161526`
BLAKE2b-256	`919b786232b095a035ba9f287838908345cd924e13725eb4e7c5e2c70fe86c48`

See more details on using hashes here.

Provenance

The following attestation bundles were made for livekit_wakeword-0.1.0-py3-none-any.whl:

Publisher: publish.yml on livekit/livekit-wakeword

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: livekit_wakeword-0.1.0-py3-none-any.whl
- Subject digest: e5968e8b65d941155d3f922ec15fded3f38628b21199b0c9d6ea56b8a947db62
- Sigstore transparency entry: 1109295470
- Sigstore integration time: Mar 16, 2026
Source repository:
- Permalink: livekit/livekit-wakeword@f0699df0d6216be7939ca3dcb0ed209c1218cffe
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/livekit
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@f0699df0d6216be7939ca3dcb0ed209c1218cffe
- Trigger Event: push

livekit-wakeword 0.1.0

Navigation

Verified details

Project links

Owner

GitHub Statistics

Meta

Unverified details

Meta

Classifiers

Project description

livekit-wakeword

Quick Start

Using Existing Models and Library

Training New Models Using The CLI

Training New Models Using The Python API

openWakeWord vs livekit-wakeword

Architecture

Results

Why conv-attention wins

Detailed Documentation

License

Project details

Verified details

Project links

Owner

GitHub Statistics

Meta

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance