Load arbitrary Voice Activity Detection (VAD) models behind a unified ONNX API
Project description
vadonnx
Load arbitrary Voice Activity Detection models behind a single, unified API — every model runs through ONNX Runtime. One streaming/batch interface, one audio-format story, pluggable models.
from vadonnx import load_vad
vad = load_vad("silero") # bundled, works fully offline
prob = vad.process_chunk(pcm_bytes) # streaming → float in [0, 1]
segments = vad.get_speech_segments(audio, sample_rate=16000)
# -> [SpeechSegment(start=0.32, end=2.27), SpeechSegment(start=3.27, end=4.45), ...]
Why
Every VAD ships its own loader, audio format, feature pipeline and state handling.
vadonnx hides that behind one VADModel interface: feed it audio (raw int16
bytes, numpy arrays, any sample rate) and get back per-frame speech probabilities or
ready-made speech segments. Models are described declaratively by an
IOSignature, so a single generic engine drives most of them
and you can point the same API at any custom .onnx file.
- Lightweight runtime — only
numpy,onnxruntime,huggingface_hub. - Offline by default — a small Silero model is bundled in the wheel.
- Streaming and batch —
process_chunk()for live audio,get_speech_segments()/probabilities()for whole buffers. - Bring your own model — load any ONNX VAD by path/URL with a signature.
- Extensible — third parties register backends/models via entry points.
Install
uv pip install vadonnx # runtime (numpy + onnxruntime + huggingface_hub)
uv pip install "vadonnx[mic]" # + microphone examples
Models
| name | rate | parity vs upstream | notes |
|---|---|---|---|
silero / silero-8k / silero-op15 |
16k / 8k / 16k | MAE 0 | bundled default, raw PCM |
marblenet / marblenet-int8 |
16k | MAE 4e-4 | NVIDIA NeMo Frame-VAD, multilingual (license) |
pyannote / pyannote-int8 |
16k | MAE 0 | pyannote segmentation-3.0, windowed |
fsmn / fsmn-quant |
16k | tracks upstream | FunASR FSMN-VAD; needs vadonnx[fsmn] |
speechbrain |
16k | MAE 0 | SpeechBrain CRDNN, LibriParty-trained |
ten |
16k | — | feature extractor provided by TEN's native library |
See docs/backends.md for per-model detail and the benchmark for measured comparisons across datasets, including WebRTC and energy baselines.
Models other than the bundled Silero are downloaded on first use from the
TigreGotico HuggingFace org and cached under
$XDG_DATA_HOME/vadonnx. See docs/backends.md for per-model detail
and parity notes.
CLI
vadonnx list # list available models
vadonnx probe silero # print a model's ONNX input/output signature
vadonnx segment speech.wav # print detected speech segments of a WAV
Documentation
- Quickstart
- Streaming
- Custom models &
IOSignature - Backends & parity notes
- Plugins
- Model conversion
- Licensing
- API reference
License
Apache-2.0. Bundled/downloaded model weights retain their upstream licenses — see docs/licensing.md.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vadonnx-0.1.0a2.tar.gz.
File metadata
- Download URL: vadonnx-0.1.0a2.tar.gz
- Upload date:
- Size: 2.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4bd68b7354ddc21d3eaaae20f5bb888db675e2d39d3e620d50a06eac292f93a9
|
|
| MD5 |
d74fa1d8b2288cada3c22419e2e648c5
|
|
| BLAKE2b-256 |
23a44bd2a53abe6a386d4e384b423454d3b21292f49d0be6651af1ed7988a67f
|
File details
Details for the file vadonnx-0.1.0a2-py3-none-any.whl.
File metadata
- Download URL: vadonnx-0.1.0a2-py3-none-any.whl
- Upload date:
- Size: 2.0 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
efba50933fec70d7c6c4ebdf0779d97ce7dbba34e69c3e382ee22d1879f6cd44
|
|
| MD5 |
f88bff5cd25967d15baed28d604a4507
|
|
| BLAKE2b-256 |
9e8be4f50671c4f0717e504e15cf4f1d47b5218190b9f4c84fd8814261ed2f58
|