Skip to main content

Load arbitrary Voice Activity Detection (VAD) models behind a unified ONNX API

Project description

vadonnx

Load arbitrary Voice Activity Detection models behind a single, unified API — every model runs through ONNX Runtime. One streaming/​batch interface, one audio-format story, pluggable models.

from vadonnx import load_vad

vad = load_vad("silero")                              # bundled, works fully offline
prob = vad.process_chunk(pcm_bytes)                   # streaming → float in [0, 1]
segments = vad.get_speech_segments(audio, sample_rate=16000)
# -> [SpeechSegment(start=0.32, end=2.27), SpeechSegment(start=3.27, end=4.45), ...]

Why

Every VAD ships its own loader, audio format, feature pipeline and state handling. vadonnx hides that behind one VADModel interface: feed it audio (raw int16 bytes, numpy arrays, any sample rate) and get back per-frame speech probabilities or ready-made speech segments. Models are described declaratively by an IOSignature, so a single generic engine drives most of them and you can point the same API at any custom .onnx file.

  • Lightweight runtime — only numpy, onnxruntime, huggingface_hub.
  • Offline by default — a small Silero model is bundled in the wheel.
  • Streaming and batchprocess_chunk() for live audio, get_speech_segments() / probabilities() for whole buffers.
  • Bring your own model — load any ONNX VAD by path/URL with a signature.
  • Extensible — third parties register backends/models via entry points.

Install

uv pip install vadonnx          # runtime (numpy + onnxruntime + huggingface_hub)
uv pip install "vadonnx[mic]"   # + microphone examples

Models

name rate parity vs upstream notes
silero / silero-8k / silero-op15 16k / 8k / 16k MAE 0 bundled default, raw PCM
marblenet / marblenet-int8 16k MAE 4e-4 NVIDIA NeMo Frame-VAD, multilingual (license)
pyannote / pyannote-int8 16k MAE 0 pyannote segmentation-3.0, windowed
fsmn / fsmn-quant 16k tracks upstream FunASR FSMN-VAD; needs vadonnx[fsmn]
speechbrain 16k MAE 0 SpeechBrain CRDNN, LibriParty-trained
ten 16k feature extractor provided by TEN's native library

See docs/backends.md for per-model detail and the benchmark for measured comparisons across datasets, including WebRTC and energy baselines.

Models other than the bundled Silero are downloaded on first use from the TigreGotico HuggingFace org and cached under $XDG_DATA_HOME/vadonnx. See docs/backends.md for per-model detail and parity notes.

CLI

vadonnx list                       # list available models
vadonnx probe silero               # print a model's ONNX input/output signature
vadonnx segment speech.wav         # print detected speech segments of a WAV

Documentation

License

Apache-2.0. Bundled/downloaded model weights retain their upstream licenses — see docs/licensing.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vadonnx-0.1.0a2.tar.gz (2.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vadonnx-0.1.0a2-py3-none-any.whl (2.0 MB view details)

Uploaded Python 3

File details

Details for the file vadonnx-0.1.0a2.tar.gz.

File metadata

  • Download URL: vadonnx-0.1.0a2.tar.gz
  • Upload date:
  • Size: 2.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for vadonnx-0.1.0a2.tar.gz
Algorithm Hash digest
SHA256 4bd68b7354ddc21d3eaaae20f5bb888db675e2d39d3e620d50a06eac292f93a9
MD5 d74fa1d8b2288cada3c22419e2e648c5
BLAKE2b-256 23a44bd2a53abe6a386d4e384b423454d3b21292f49d0be6651af1ed7988a67f

See more details on using hashes here.

File details

Details for the file vadonnx-0.1.0a2-py3-none-any.whl.

File metadata

  • Download URL: vadonnx-0.1.0a2-py3-none-any.whl
  • Upload date:
  • Size: 2.0 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for vadonnx-0.1.0a2-py3-none-any.whl
Algorithm Hash digest
SHA256 efba50933fec70d7c6c4ebdf0779d97ce7dbba34e69c3e382ee22d1879f6cd44
MD5 f88bff5cd25967d15baed28d604a4507
BLAKE2b-256 9e8be4f50671c4f0717e504e15cf4f1d47b5218190b9f4c84fd8814261ed2f58

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page