Skip to main content

Extremely fast voice activity detection in Rust with Python bindings and streaming mode support.

Project description

fast-vad

Extremely fast voice activity detection in Rust with Python bindings and streaming mode support.

Supports 16 kHz and 8 kHz audio. Fixed frame width of 32 ms (512 samples at 16 kHz and 256 samples at 8 kHz).

If you are interested in benchmark comparisons, see docs/README.md.

Benchmarking

Python benchmarks live in bench_py/ and Rust benchmarks live in bench_rs/.

uv run pytest bench_py/bench_vad.py bench_py/bench_feature_extractor.py --benchmark-sort=mean --benchmark-group-by=group
cargo bench --manifest-path bench_rs/Cargo.toml

Install

Python

pip install fast-vad

Or with uv:

uv add fast-vad

Rust

cargo add fast-vad

Architecture

fast_vad is a small fixed-frame DSP pipeline with a hardcoded lightweight classifier.

audio
  -> 32 ms frames
     - 16 kHz: 512 samples
     - 8 kHz: 256 samples
  -> Hann window
  -> real FFT
  -> 8 log-energy bands
  -> feature engineering
     - raw bands          (8)
     - noise-normalized   (8)
     - first deltas       (8)
     - second deltas      (8)
     = 32 total features
  -> hardcoded logistic regression
  -> threshold + smoothing
  -> speech / silence labels

At a glance:

  • VAD (offline / batch) splits audio into 32 ms frames and uses rayon to process complete frames in parallel while extracting the 8-band features.
  • VadStateful (streaming) runs the same per-frame pipeline one frame at a time and reuses scratch buffers instead of paying thread-pool overhead.
  • The detector keeps a running 8-band noise floor, then derives 32 total features from each frame: raw band energies, noise-normalized energies, first-order deltas, and second-order deltas.
  • Classification is a tiny hardcoded logistic-regression-style model with fixed weights and bias compiled into the crate.
  • The final decision is shaped by simple temporal rules: thresholding, minimum speech length, minimum silence length, and hangover.
  • Hot loops are SIMD-accelerated with the wide crate for windowing, spectral power computation, band-energy math, and detector feature math.
frame features (8 bands)
    | raw
    | raw - noise_floor
    | delta
    | delta2
    v
32 engineered features
    v
linear score + bias
    v
speech / silence

Build from source

Python (with uv)

Requires uv and a Rust toolchain.

git clone https://github.com/AtharvBhat/fast-vad
cd fast-vad
uv venv
uv pip install maturin
uv run maturin develop --release

The package is then importable inside the virtual environment.

Rust

cargo build --release

Add as a dependency in another crate:

[dependencies]
fast-vad = { path = "/path/to/fast-vad" }

Python usage

Config is set at construction time. VAD() and VadStateful() default to Normal mode; use with_mode or with_config to customise.

import numpy as np
import soundfile as sf
import fast_vad

audio, sr = sf.read("audio.wav", dtype="float32")
assert sr in (8000, 16000)

# Default (Normal mode)
vad = fast_vad.VAD(sr)

# Explicit mode
vad = fast_vad.VAD.with_mode(sr, fast_vad.mode.aggressive)

# Custom parameters
vad = fast_vad.VAD.with_config(
    sr,
    threshold_probability=0.7,
    min_speech_ms=100,
    min_silence_ms=300,
    hangover_ms=100,
)

# Per-sample labels
labels = vad.detect(audio)

# Per-frame labels
frame_labels = vad.detect_frames(audio)

# Speech segments as a (N, 2) uint64 numpy array of [start, end] sample indices
segments = vad.detect_segments(audio)
for start, end in segments:
    print(f"speech: {start/sr:.2f}s – {end/sr:.2f}s")

Streaming

# Default (Normal mode)
vad = fast_vad.VadStateful(sr)

# Explicit mode
vad = fast_vad.VadStateful.with_mode(sr, fast_vad.mode.normal)

# Custom parameters
vad = fast_vad.VadStateful.with_config(sr, 0.7, 100, 300, 100)

frame_size = vad.frame_size  # 512 at 16 kHz, 256 at 8 kHz

for i in range(0, len(audio) - frame_size + 1, frame_size):
    is_speech = vad.detect_frame(audio[i : i + frame_size])
    print(f"frame {i // frame_size}: {'speech' if is_speech else 'silence'}")

vad.reset_state()  # reuse for another stream

Feature extraction

fe = fast_vad.FeatureExtractor(sr)
features = fe.extract_features(audio)  # shape: (num_frames, 8)

Modes

Constant Description
fast_vad.mode.permissive Low false-negative rate; more speech accepted
fast_vad.mode.normal Balanced, general-purpose
fast_vad.mode.aggressive Low false-positive rate; stricter

Rust usage

Config is set at construction. VAD::new and VadStateful::new default to Normal mode; use with_mode or with_config to customise.

use fast_vad::vad::detector::{VAD, VADModes, VadConfig};

fn main() -> Result<(), fast_vad::VadError> {
    let audio = vec![0.0f32; 16000]; // 1 second of silence

    // Default (Normal mode)
    let vad = VAD::new(16000)?;

    // Explicit mode
    let vad = VAD::with_mode(16000, VADModes::Aggressive)?;

    // Custom parameters
    let vad = VAD::with_config(16000, VadConfig {
        threshold_probability: 0.7,
        min_speech_ms: 100,
        min_silence_ms: 300,
        hangover_ms: 100,
    })?;

    let labels = vad.detect(&audio);           // one bool per sample
    let frame_labels = vad.detect_frames(&audio); // one bool per frame
    let segments = vad.detect_segments(&audio);   // Vec<[start, end]>

    Ok(())
}

Streaming

use fast_vad::vad::detector::{VadStateful, VADModes, VadConfig};

fn main() -> Result<(), fast_vad::VadError> {
    let audio = vec![0.0f32; 16000];

    // Default (Normal mode)
    let mut vad = VadStateful::new(16000)?;

    // Explicit mode
    let mut vad = VadStateful::with_mode(16000, VADModes::Normal)?;

    // Custom parameters
    let mut vad = VadStateful::with_config(16000, VadConfig {
        threshold_probability: 0.7,
        min_speech_ms: 100,
        min_silence_ms: 300,
        hangover_ms: 100,
    })?;

    let frame_size = vad.frame_size();
    for frame in audio.chunks_exact(frame_size) {
        let is_speech = vad.detect_frame(frame)?;
        println!("{is_speech}");
    }

    vad.reset_state(); // reuse for another stream
    Ok(())
}

License

Licensed under either of

at your option.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fast_vad-0.2.0.tar.gz (30.1 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

fast_vad-0.2.0-cp311-abi3-win_amd64.whl (579.1 kB view details)

Uploaded CPython 3.11+Windows x86-64

fast_vad-0.2.0-cp311-abi3-manylinux_2_28_x86_64.whl (810.2 kB view details)

Uploaded CPython 3.11+manylinux: glibc 2.28+ x86-64

fast_vad-0.2.0-cp311-abi3-manylinux_2_28_aarch64.whl (679.0 kB view details)

Uploaded CPython 3.11+manylinux: glibc 2.28+ ARM64

fast_vad-0.2.0-cp311-abi3-macosx_11_0_arm64.whl (595.6 kB view details)

Uploaded CPython 3.11+macOS 11.0+ ARM64

fast_vad-0.2.0-cp311-abi3-macosx_10_12_x86_64.whl (734.9 kB view details)

Uploaded CPython 3.11+macOS 10.12+ x86-64

File details

Details for the file fast_vad-0.2.0.tar.gz.

File metadata

  • Download URL: fast_vad-0.2.0.tar.gz
  • Upload date:
  • Size: 30.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for fast_vad-0.2.0.tar.gz
Algorithm Hash digest
SHA256 e05db52089910d460a44445695cd23d7f2e0333d31eeecef047d71f0177d3606
MD5 f392534679f82a31403e605821581f39
BLAKE2b-256 436caa17d0348ede4d910847738654f717b607031320f803b4f3075113b276cb

See more details on using hashes here.

Provenance

The following attestation bundles were made for fast_vad-0.2.0.tar.gz:

Publisher: ci.yml on AtharvBhat/fast-vad

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fast_vad-0.2.0-cp311-abi3-win_amd64.whl.

File metadata

  • Download URL: fast_vad-0.2.0-cp311-abi3-win_amd64.whl
  • Upload date:
  • Size: 579.1 kB
  • Tags: CPython 3.11+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for fast_vad-0.2.0-cp311-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 852d8e87d8327c7db6bdf495ef9b0c4cb6a82c8bfcc5b0e8b2d45db24fc84b42
MD5 b20fc9a87d565093a7404e1a39c64419
BLAKE2b-256 cf372bbf10b4e219a763b5b86ebc679aab8640e59ace2a6639755398b8d67ad2

See more details on using hashes here.

Provenance

The following attestation bundles were made for fast_vad-0.2.0-cp311-abi3-win_amd64.whl:

Publisher: ci.yml on AtharvBhat/fast-vad

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fast_vad-0.2.0-cp311-abi3-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for fast_vad-0.2.0-cp311-abi3-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 adcaa01ab15a953a9db1520de271969d8afa8f0d891a71756ab3a3bf784412d8
MD5 357952177693ac26a1ece41ef07b5b4d
BLAKE2b-256 d9e2f989fc9f4c3809f2126863941d01c101738b60c6abfb33885da7ea3daf9d

See more details on using hashes here.

Provenance

The following attestation bundles were made for fast_vad-0.2.0-cp311-abi3-manylinux_2_28_x86_64.whl:

Publisher: ci.yml on AtharvBhat/fast-vad

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fast_vad-0.2.0-cp311-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for fast_vad-0.2.0-cp311-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 a79ef740fcfc6860097111c91472ea86150fcb5176cbc2287c0fa9ff448a3110
MD5 c8b1342179f334e96aa0e51f3b74bffb
BLAKE2b-256 d200505dbf7d404154e8dbf22e22dccd7dec716daa39adb11e95c77895190778

See more details on using hashes here.

Provenance

The following attestation bundles were made for fast_vad-0.2.0-cp311-abi3-manylinux_2_28_aarch64.whl:

Publisher: ci.yml on AtharvBhat/fast-vad

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fast_vad-0.2.0-cp311-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for fast_vad-0.2.0-cp311-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 b0bcdea61c2dd035fd9f2f0641ee36157b35c906fa1e3c6ad2bf7595ba0d3a44
MD5 d781afb61a0b6dd0a09f945b46b29572
BLAKE2b-256 8721d1296f5ae48bca025bb40ba3428492cf122a89b031d07bcd1644da05db63

See more details on using hashes here.

Provenance

The following attestation bundles were made for fast_vad-0.2.0-cp311-abi3-macosx_11_0_arm64.whl:

Publisher: ci.yml on AtharvBhat/fast-vad

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fast_vad-0.2.0-cp311-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for fast_vad-0.2.0-cp311-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 c419b437f6c5f28a04d4996c553df274059d215e1633ccdbc43ea2059c57b345
MD5 6c665851ad3cbbfb8010265a7b1ea16c
BLAKE2b-256 20ee2e3a7b9505be96cd478185d90b933c31194b231806500ddcd6480d14d15a

See more details on using hashes here.

Provenance

The following attestation bundles were made for fast_vad-0.2.0-cp311-abi3-macosx_10_12_x86_64.whl:

Publisher: ci.yml on AtharvBhat/fast-vad

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page