Skip to main content

Extremely fast voice activity detection in Rust with Python bindings and streaming mode support.

Project description

fast-vad

Extremely fast voice activity detection in Rust with Python bindings and streaming mode support.

Supports 16 kHz and 8 kHz audio. Fixed frame width of 32 ms (512 samples at 16 kHz and 256 samples at 8 kHz).

If you are interested in benchmark comparisons, see docs/README.md.

Benchmarking

Python benchmarks live in bench_py/ and Rust benchmarks live in bench_rs/.

uv run pytest bench_py/bench_vad.py bench_py/bench_feature_extractor.py --benchmark-sort=mean --benchmark-group-by=group
cargo bench --manifest-path bench_rs/Cargo.toml

Architecture

fast_vad is a small fixed-frame DSP pipeline with a hardcoded lightweight classifier.

audio
  -> 32 ms frames
     - 16 kHz: 512 samples
     - 8 kHz: 256 samples
  -> Hann window
  -> real FFT
  -> 8 log-energy bands
  -> feature engineering
     - raw bands          (8)
     - noise-normalized   (8)
     - first deltas       (8)
     - second deltas      (8)
     = 32 total features
  -> hardcoded logistic regression
  -> threshold + smoothing
  -> speech / silence labels

At a glance:

  • VAD (offline / batch) splits audio into 32 ms frames and uses rayon to process complete frames in parallel while extracting the 8-band features.
  • VadStateful (streaming) runs the same per-frame pipeline one frame at a time and reuses scratch buffers instead of paying thread-pool overhead.
  • The detector keeps a running 8-band noise floor, then derives 32 total features from each frame: raw band energies, noise-normalized energies, first-order deltas, and second-order deltas.
  • Classification is a tiny hardcoded logistic-regression-style model with fixed weights and bias compiled into the crate.
  • The final decision is shaped by simple temporal rules: thresholding, minimum speech length, minimum silence length, and hangover.
  • Hot loops are SIMD-accelerated with the wide crate for windowing, spectral power computation, band-energy math, and detector feature math.
frame features (8 bands)
    | raw
    | raw - noise_floor
    | delta
    | delta2
    v
32 engineered features
    v
linear score + bias
    v
speech / silence

Build from source

Python (with uv)

Requires uv and a Rust toolchain.

git clone https://github.com/AtharvBhat/fast-vad
cd fast-vad
uv venv
uv pip install maturin
uv run maturin develop --release

The package is then importable inside the virtual environment.

Rust

cargo build --release

Add as a dependency in another crate:

[dependencies]
fast-vad = { path = "/path/to/fast-vad" }

Python usage

Config is set at construction time. VAD() and VadStateful() default to Normal mode; use with_mode or with_config to customise.

import numpy as np
import soundfile as sf
import fast_vad

audio, sr = sf.read("audio.wav", dtype="float32")
assert sr in (8000, 16000)

# Default (Normal mode)
vad = fast_vad.VAD(sr)

# Explicit mode
vad = fast_vad.VAD.with_mode(sr, fast_vad.mode.aggressive)

# Custom parameters
vad = fast_vad.VAD.with_config(
    sr,
    threshold_probability=0.7,
    min_speech_ms=100,
    min_silence_ms=300,
    hangover_ms=100,
)

# Per-sample labels
labels = vad.detect(audio)

# Per-frame labels
frame_labels = vad.detect_frames(audio)

# Speech segments as a (N, 2) uint64 numpy array of [start, end] sample indices
segments = vad.detect_segments(audio)
for start, end in segments:
    print(f"speech: {start/sr:.2f}s – {end/sr:.2f}s")

Streaming

# Default (Normal mode)
vad = fast_vad.VadStateful(sr)

# Explicit mode
vad = fast_vad.VadStateful.with_mode(sr, fast_vad.mode.normal)

# Custom parameters
vad = fast_vad.VadStateful.with_config(sr, 0.7, 100, 300, 100)

frame_size = vad.frame_size  # 512 at 16 kHz, 256 at 8 kHz

for i in range(0, len(audio) - frame_size + 1, frame_size):
    is_speech = vad.detect_frame(audio[i : i + frame_size])
    print(f"frame {i // frame_size}: {'speech' if is_speech else 'silence'}")

vad.reset_state()  # reuse for another stream

Feature extraction

fe = fast_vad.FeatureExtractor(sr)
features = fe.extract_features(audio)  # shape: (num_frames, 8)

Modes

Constant Description
fast_vad.mode.permissive Low false-negative rate; more speech accepted
fast_vad.mode.normal Balanced, general-purpose
fast_vad.mode.aggressive Low false-positive rate; stricter

Rust usage

Config is set at construction. VAD::new and VadStateful::new default to Normal mode; use with_mode or with_config to customise.

use fast_vad::vad::detector::{VAD, VADModes, VadConfig};

fn main() -> Result<(), fast_vad::VadError> {
    let audio = vec![0.0f32; 16000]; // 1 second of silence

    // Default (Normal mode)
    let vad = VAD::new(16000)?;

    // Explicit mode
    let vad = VAD::with_mode(16000, VADModes::Aggressive)?;

    // Custom parameters
    let vad = VAD::with_config(16000, VadConfig {
        threshold_probability: 0.7,
        min_speech_ms: 100,
        min_silence_ms: 300,
        hangover_ms: 100,
    })?;

    let labels = vad.detect(&audio);           // one bool per sample
    let frame_labels = vad.detect_frames(&audio); // one bool per frame
    let segments = vad.detect_segments(&audio);   // Vec<[start, end]>

    Ok(())
}

Streaming

use fast_vad::vad::detector::{VadStateful, VADModes, VadConfig};

fn main() -> Result<(), fast_vad::VadError> {
    let audio = vec![0.0f32; 16000];

    // Default (Normal mode)
    let mut vad = VadStateful::new(16000)?;

    // Explicit mode
    let mut vad = VadStateful::with_mode(16000, VADModes::Normal)?;

    // Custom parameters
    let mut vad = VadStateful::with_config(16000, VadConfig {
        threshold_probability: 0.7,
        min_speech_ms: 100,
        min_silence_ms: 300,
        hangover_ms: 100,
    })?;

    let frame_size = vad.frame_size();
    for frame in audio.chunks_exact(frame_size) {
        let is_speech = vad.detect_frame(frame)?;
        println!("{is_speech}");
    }

    vad.reset_state(); // reuse for another stream
    Ok(())
}

License

Licensed under either of

at your option.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fast_vad-0.1.0.tar.gz (29.1 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

fast_vad-0.1.0-cp311-abi3-win_amd64.whl (562.7 kB view details)

Uploaded CPython 3.11+Windows x86-64

fast_vad-0.1.0-cp311-abi3-manylinux_2_28_x86_64.whl (794.2 kB view details)

Uploaded CPython 3.11+manylinux: glibc 2.28+ x86-64

fast_vad-0.1.0-cp311-abi3-manylinux_2_28_aarch64.whl (665.1 kB view details)

Uploaded CPython 3.11+manylinux: glibc 2.28+ ARM64

fast_vad-0.1.0-cp311-abi3-macosx_11_0_arm64.whl (586.9 kB view details)

Uploaded CPython 3.11+macOS 11.0+ ARM64

fast_vad-0.1.0-cp311-abi3-macosx_10_12_x86_64.whl (719.9 kB view details)

Uploaded CPython 3.11+macOS 10.12+ x86-64

File details

Details for the file fast_vad-0.1.0.tar.gz.

File metadata

  • Download URL: fast_vad-0.1.0.tar.gz
  • Upload date:
  • Size: 29.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for fast_vad-0.1.0.tar.gz
Algorithm Hash digest
SHA256 fe76cb9a039c8a9e7d6b97e4ffd22779c98770811dcdee1e1b1a1032b54b5092
MD5 730fcc6092040b710b775be481203241
BLAKE2b-256 0370fa0b26b4e1ca115cbd591a72eef5439c57ca76fbdcf0b5a27b938588e541

See more details on using hashes here.

Provenance

The following attestation bundles were made for fast_vad-0.1.0.tar.gz:

Publisher: ci.yml on AtharvBhat/fast-vad

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fast_vad-0.1.0-cp311-abi3-win_amd64.whl.

File metadata

  • Download URL: fast_vad-0.1.0-cp311-abi3-win_amd64.whl
  • Upload date:
  • Size: 562.7 kB
  • Tags: CPython 3.11+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for fast_vad-0.1.0-cp311-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 7bc87691435d7ba24d930dd3288c128e2956cd7f33ea22d54f3f5c11a342d8e2
MD5 d824a9d96ab6ee86b4c83ed75cc22707
BLAKE2b-256 00b7e8cf0e05f548b2ba170f3390f4b70208eb1d4db48f940b1f2273e9da4d89

See more details on using hashes here.

Provenance

The following attestation bundles were made for fast_vad-0.1.0-cp311-abi3-win_amd64.whl:

Publisher: ci.yml on AtharvBhat/fast-vad

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fast_vad-0.1.0-cp311-abi3-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for fast_vad-0.1.0-cp311-abi3-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 b4812e17008a6f8a6b3903777d73822a947a9c8df8874157f60a13cd62675ef7
MD5 c4420c5fca0425324ead54f167c467ff
BLAKE2b-256 e79a656d75e7bfb558e72b901ff29ef90ea2f5f40153fa1c8efd0b4b683f5400

See more details on using hashes here.

Provenance

The following attestation bundles were made for fast_vad-0.1.0-cp311-abi3-manylinux_2_28_x86_64.whl:

Publisher: ci.yml on AtharvBhat/fast-vad

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fast_vad-0.1.0-cp311-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for fast_vad-0.1.0-cp311-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 219d098c0a09f0d5a2c2c70344646b84b32edaaac94e467d2101db47b092b435
MD5 74cdf956f1b2ccfb080febc0754ef85a
BLAKE2b-256 53ee9d91841cc35a659773aeb31fe4573f4b2f99021b6e069ffd6bddd046d58c

See more details on using hashes here.

Provenance

The following attestation bundles were made for fast_vad-0.1.0-cp311-abi3-manylinux_2_28_aarch64.whl:

Publisher: ci.yml on AtharvBhat/fast-vad

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fast_vad-0.1.0-cp311-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for fast_vad-0.1.0-cp311-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 a26217eb7b104f733a5cfeed41c81d64812e7de8c5624999ab29f6af24aac54b
MD5 d456a5e288a141604b664711bba2fd5b
BLAKE2b-256 2f2baccaac7195fc9852a8cb194a09b1e031698ce5dfd3b770c716ebe0dbcc64

See more details on using hashes here.

Provenance

The following attestation bundles were made for fast_vad-0.1.0-cp311-abi3-macosx_11_0_arm64.whl:

Publisher: ci.yml on AtharvBhat/fast-vad

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fast_vad-0.1.0-cp311-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for fast_vad-0.1.0-cp311-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 db565d50acce794f6b9f7af9052621ffc1fc57084932607bad2b14bb65f33b6e
MD5 e5e2f9dfa37368df8658d38567cebf5e
BLAKE2b-256 ea0da0e4bda6e4a192cf46ea0f27cd2bdae7d3c92d330019d1e1ed57a1f55443

See more details on using hashes here.

Provenance

The following attestation bundles were made for fast_vad-0.1.0-cp311-abi3-macosx_10_12_x86_64.whl:

Publisher: ci.yml on AtharvBhat/fast-vad

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page