Skip to main content

Load arbitrary Voice Activity Detection (VAD) models behind a unified ONNX API

Project description

vadonnx

Load arbitrary Voice Activity Detection models behind a single, unified API — every model runs through ONNX Runtime. One streaming/​batch interface, one audio-format story, pluggable models.

from vadonnx import load_vad

vad = load_vad("silero")                              # bundled, works fully offline
prob = vad.process_chunk(pcm_bytes)                   # streaming → float in [0, 1]
segments = vad.get_speech_segments(audio, sample_rate=16000)
# -> [SpeechSegment(start=0.32, end=2.27), SpeechSegment(start=3.27, end=4.45), ...]

Why

Every VAD ships its own loader, audio format, feature pipeline and state handling. vadonnx hides that behind one VADModel interface: feed it audio (raw int16 bytes, numpy arrays, any sample rate) and get back per-frame speech probabilities or ready-made speech segments. Models are described declaratively by an IOSignature, so a single generic engine drives most of them and you can point the same API at any custom .onnx file.

  • Lightweight runtime — only numpy, onnxruntime, huggingface_hub.
  • Offline by default — a small Silero model is bundled in the wheel.
  • Streaming and batchprocess_chunk() for live audio, get_speech_segments() / probabilities() for whole buffers.
  • Bring your own model — load any ONNX VAD by path/URL with a signature.
  • Extensible — third parties register backends/models via entry points.

Install

uv pip install vadonnx          # runtime (numpy + onnxruntime + huggingface_hub)
uv pip install "vadonnx[mic]"   # + microphone examples

Models

name rate parity vs upstream notes
silero / silero-8k / silero-op15 16k / 8k / 16k MAE 0 bundled default, raw PCM
marblenet / marblenet-int8 16k MAE 4e-4 NVIDIA NeMo Frame-VAD, multilingual (license)
pyannote / pyannote-int8 16k MAE 0 pyannote segmentation-3.0, windowed
fsmn / fsmn-quant 16k tracks upstream FunASR FSMN-VAD; needs vadonnx[fsmn]
speechbrain 16k MAE 0 SpeechBrain CRDNN, LibriParty-trained
ten 16k feature extractor provided by TEN's native library

See docs/backends.md for per-model detail and the benchmark for measured comparisons across datasets, including WebRTC and energy baselines.

Models other than the bundled Silero are downloaded on first use from the TigreGotico HuggingFace org and cached under $XDG_DATA_HOME/vadonnx. See docs/backends.md for per-model detail and parity notes.

CLI

vadonnx list                       # list available models
vadonnx probe silero               # print a model's ONNX input/output signature
vadonnx segment speech.wav         # print detected speech segments of a WAV

Documentation

License

Apache-2.0. Bundled/downloaded model weights retain their upstream licenses — see docs/licensing.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vadonnx-0.1.0.tar.gz (2.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vadonnx-0.1.0-py3-none-any.whl (2.0 MB view details)

Uploaded Python 3

File details

Details for the file vadonnx-0.1.0.tar.gz.

File metadata

  • Download URL: vadonnx-0.1.0.tar.gz
  • Upload date:
  • Size: 2.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for vadonnx-0.1.0.tar.gz
Algorithm Hash digest
SHA256 6c250359aa260fc66aedd65fa26b78d45fea3112017ae7c8d31e02ffa5916442
MD5 f9b1694b2caa9a1b08f2465fd187f42f
BLAKE2b-256 f49dc9db1aecbe5699a1a82fe9e8bbdc501087bf891fabf032fa1f5dbe038f92

See more details on using hashes here.

File details

Details for the file vadonnx-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: vadonnx-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 2.0 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for vadonnx-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 453a12d3245be1cf518db8f79063483c6c5788e684905148be8543a0df2091b6
MD5 b9ad8c97aa69e7a1fbe24e00069eb8ae
BLAKE2b-256 5801db22801c1a3a55f0e9a732941c22a4005007865fb17aa5006e2c70797a40

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page