Skip to main content

Extremely fast voice activity detection in Rust with Python bindings and streaming mode support.

Project description

fast-vad

Extremely fast voice activity detection in Rust with Python bindings and streaming mode support. Significantly faster than WebRTC VAD and orders of magnitude faster than Silero ONNX. See benchmark comparisons.

Supports 16 kHz and 8 kHz sample rates.

Architecture

Audio is split into non-overlapping 32 ms frames (512 samples at 16 kHz, 256 at 8 kHz), Hann-windowed, FFT'd, and collapsed into 8 log-energy bands covering roughly 94-4000 Hz.

Per frame, the detector builds 32 features: 8 raw log-energies, 8 noise-normalised values (raw minus a running noise floor), and their first and second order deltas. A logistic regression model with weights compiled into the crate scores these features and compares the result to a mode-specific threshold. The noise floor is a per-band exponential moving average that only updates on silence frames, so it adapts to background noise without being contaminated by speech.

Raw frame labels are then post-processed: short speech bursts below min_speech_ms are dropped, short silence gaps below min_silence_ms are filled, and voiced regions are extended by hangover_ms to avoid clipping word endings.

VAD processes all frames in parallel with rayon. VadStateful processes one frame at a time with reused FFT scratch buffers for low-latency streaming. Hot loops are SIMD-accelerated via the wide crate.

Install

Python

pip install fast-vad

Or with uv:

uv add fast-vad

Rust

cargo add fast-vad

Build from source

Python

Requires a Rust toolchain and maturin.

git clone https://github.com/AtharvBhat/fast-vad
cd fast-vad
maturin develop --release

Rust

cargo build --release

Python usage

Fast vad comes with a few modes.VAD() and VadStateful() default to fast_vad.mode.normal for offline and streaming mode respectively. To customize parameters use with_mode or with_config for even finer control.

import numpy as np
import soundfile as sf
import fast_vad

audio, sr = sf.read("audio.wav", dtype="float32")
assert sr in (8000, 16000)

# Default (Normal mode)
vad = fast_vad.VAD(sr)

# Explicit mode
vad = fast_vad.VAD.with_mode(sr, fast_vad.mode.aggressive) # choose permissive, normal or aggressive 

# Custom parameters
vad = fast_vad.VAD.with_config(
    sr,
    threshold_probability=0.7,
    min_speech_ms=100,
    min_silence_ms=300,
    hangover_ms=100,
)

# Per-sample labels
labels = vad.detect(audio)

# Per-frame labels
frame_labels = vad.detect_frames(audio)

# Speech segments as a (N, 2) uint64 numpy array of [start, end] sample indices
segments = vad.detect_segments(audio)
for start, end in segments:
    print(f"speech: {start/sr:.2f}s – {end/sr:.2f}s")

Streaming

# Default (Normal mode)
vad = fast_vad.VadStateful(sr)

# Explicit mode
vad = fast_vad.VadStateful.with_mode(sr, fast_vad.mode.normal)

# Custom parameters
vad = fast_vad.VadStateful.with_config(sr, 0.7, 100, 300, 100)

frame_size = vad.frame_size  # 512 at 16 kHz, 256 at 8 kHz

for i in range(0, len(audio) - frame_size + 1, frame_size):
    is_speech = vad.detect_frame(audio[i : i + frame_size])
    print(f"frame {i // frame_size}: {'speech' if is_speech else 'silence'}")

vad.reset_state()  # reuse for another stream

Feature extraction

You can also use fast vad as a feature extractor.

fe = fast_vad.FeatureExtractor(sr)

# 8 log-energy band features per frame
features = fe.extract_features(audio)  # shape: (num_frames, 8)

# 24-dimensional features per frame: raw bands + first- and second-order deltas
features = fe.feature_engineer(audio)  # shape: (num_frames, 24)

Modes

Constant Description
fast_vad.mode.permissive Low false-negative rate; more speech accepted
fast_vad.mode.normal Balanced, general-purpose
fast_vad.mode.aggressive Low false-positive rate; stricter

The built-in modes were tuned against LibriVAD, so they work best on read speech. For other domains (phone calls, meetings, noisy environments, etc.) you'll likely get better results tuning with_config() against your own data.

Rust usage

Config is set at construction. VAD::new and VadStateful::new default to Normal mode; use with_mode or with_config to customise.

use fast_vad::vad::detector::{VAD, VADModes, VadConfig};

fn main() -> Result<(), fast_vad::VadError> {
    let audio = vec![0.0f32; 16000]; // 1 second of silence

    // Default (Normal mode)
    let vad = VAD::new(16000)?;

    // Explicit mode
    let vad = VAD::with_mode(16000, VADModes::Aggressive)?;

    // Custom parameters
    let vad = VAD::with_config(16000, VadConfig {
        threshold_probability: 0.7,
        min_speech_ms: 100,
        min_silence_ms: 300,
        hangover_ms: 100,
    })?;

    let labels = vad.detect(&audio);           // one bool per sample
    let frame_labels = vad.detect_frames(&audio); // one bool per frame
    let segments = vad.detect_segments(&audio);   // Vec<[start, end]>

    Ok(())
}

Streaming

use fast_vad::vad::detector::{VadStateful, VADModes, VadConfig};

fn main() -> Result<(), fast_vad::VadError> {
    let audio = vec![0.0f32; 16000];

    // Default (Normal mode)
    let mut vad = VadStateful::new(16000)?;

    // Explicit mode
    let mut vad = VadStateful::with_mode(16000, VADModes::Normal)?;

    // Custom parameters
    let mut vad = VadStateful::with_config(16000, VadConfig {
        threshold_probability: 0.7,
        min_speech_ms: 100,
        min_silence_ms: 300,
        hangover_ms: 100,
    })?;

    let frame_size = vad.frame_size();
    for frame in audio.chunks_exact(frame_size) {
        let is_speech = vad.detect_frame(frame)?;
        println!("{is_speech}");
    }

    vad.reset_state(); // reuse for another stream
    Ok(())
}

Benchmarking

cargo bench --manifest-path bench_rs/Cargo.toml

License

Licensed under either of

at your option.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fast_vad-0.2.1.tar.gz (31.3 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

fast_vad-0.2.1-cp311-abi3-win_amd64.whl (586.2 kB view details)

Uploaded CPython 3.11+Windows x86-64

fast_vad-0.2.1-cp311-abi3-manylinux_2_28_x86_64.whl (815.9 kB view details)

Uploaded CPython 3.11+manylinux: glibc 2.28+ x86-64

fast_vad-0.2.1-cp311-abi3-manylinux_2_28_aarch64.whl (677.3 kB view details)

Uploaded CPython 3.11+manylinux: glibc 2.28+ ARM64

fast_vad-0.2.1-cp311-abi3-macosx_11_0_arm64.whl (598.3 kB view details)

Uploaded CPython 3.11+macOS 11.0+ ARM64

fast_vad-0.2.1-cp311-abi3-macosx_10_12_x86_64.whl (739.7 kB view details)

Uploaded CPython 3.11+macOS 10.12+ x86-64

File details

Details for the file fast_vad-0.2.1.tar.gz.

File metadata

  • Download URL: fast_vad-0.2.1.tar.gz
  • Upload date:
  • Size: 31.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for fast_vad-0.2.1.tar.gz
Algorithm Hash digest
SHA256 16d41e946bacde94527865a28f35cd782de31d9e49c827def8184f60c4e6944e
MD5 ae19a828ad05094ab39e82333fd7164a
BLAKE2b-256 b4ade233693c6e405869ed197fe87aeb3e7f3d74f702dbd9c1b015254d48a3cc

See more details on using hashes here.

Provenance

The following attestation bundles were made for fast_vad-0.2.1.tar.gz:

Publisher: ci.yml on AtharvBhat/fast-vad

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fast_vad-0.2.1-cp311-abi3-win_amd64.whl.

File metadata

  • Download URL: fast_vad-0.2.1-cp311-abi3-win_amd64.whl
  • Upload date:
  • Size: 586.2 kB
  • Tags: CPython 3.11+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for fast_vad-0.2.1-cp311-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 f33ec68687ee8a6f39cb2ed00dff002f9d375c18aa7be9d7bf4d659c69322e9b
MD5 cc8ffb219d077f3247f55bfc701bdb3a
BLAKE2b-256 8dd76db3d00875a3536c5b4b0a26916d9fcea866707499f99a8c99d7c0f6f24d

See more details on using hashes here.

Provenance

The following attestation bundles were made for fast_vad-0.2.1-cp311-abi3-win_amd64.whl:

Publisher: ci.yml on AtharvBhat/fast-vad

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fast_vad-0.2.1-cp311-abi3-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for fast_vad-0.2.1-cp311-abi3-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 00807e0bc7f303fe997d67f258360785420547265f5b79675b5ef605babda2a9
MD5 0bc734b85a654d28a0a451f4b43dbb66
BLAKE2b-256 bd406b2d3ec3335d698dc64708693770f055f8d7276194090db1fab464e31e65

See more details on using hashes here.

Provenance

The following attestation bundles were made for fast_vad-0.2.1-cp311-abi3-manylinux_2_28_x86_64.whl:

Publisher: ci.yml on AtharvBhat/fast-vad

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fast_vad-0.2.1-cp311-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for fast_vad-0.2.1-cp311-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 ab1446a85bf5039edcf6658fa227c9a4ce098482ba5d68ec088650a6ae80af8d
MD5 a222e9911fec565bc3e0add5eb1aa552
BLAKE2b-256 47cc0df69a380814f0af04aaa76ffd0eeb543af07b39f8036379d087f79d4dca

See more details on using hashes here.

Provenance

The following attestation bundles were made for fast_vad-0.2.1-cp311-abi3-manylinux_2_28_aarch64.whl:

Publisher: ci.yml on AtharvBhat/fast-vad

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fast_vad-0.2.1-cp311-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for fast_vad-0.2.1-cp311-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 c17f819a123627cf194aa18543b1b37aa3860caab614d3a852c09977e68dd98e
MD5 9f26077efb93479f888c27955a8a6de1
BLAKE2b-256 cebe7d0bb1af7beb756923c04218dd89a2eda3b6d0870ec018232161b629af20

See more details on using hashes here.

Provenance

The following attestation bundles were made for fast_vad-0.2.1-cp311-abi3-macosx_11_0_arm64.whl:

Publisher: ci.yml on AtharvBhat/fast-vad

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fast_vad-0.2.1-cp311-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for fast_vad-0.2.1-cp311-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 f2937b3511447b91419df23cdffe901c63108998bb8b41a8b7d8ab200fe722f4
MD5 a711ff9c987d4df53ef1db547046022d
BLAKE2b-256 134a33f23eacef546cbdbad9a4d3a6931993974b7d63b5ee52125c26e0034a65

See more details on using hashes here.

Provenance

The following attestation bundles were made for fast_vad-0.2.1-cp311-abi3-macosx_10_12_x86_64.whl:

Publisher: ci.yml on AtharvBhat/fast-vad

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page