Skip to main content

rVADfast - a fast and robust unsupervised VAD

Project description

rVADfast

The Python library for an unsupervised, fast method for robust voice activity detection (rVAD), as presented in rVAD: An Unsupervised Segment-Based Robust Voice Activity Detection Method, Computer Speech & Language, 2020 or its arXiv version. More info on the rVAD GitHub page.

The rVAD paper published in Computer Speech & Language won International Speech Communication Association (ISCA) 2022 Best Research Paper Award.

The rVAD method consists of two passes of denoising followed by a VAD stage. It has been applied as a preprocessor for a wide range of applications, such as speech recognition, speaker identification, language identification, age and gender identification, self-supervised learning, human-robot interaction, audio archive segmentation, and so on as in Google Scholar.

The method is unsupervised to make it applicable to a broad range of acoustic environments, and it is optimized considering both noisy and clean conditions.

The rVAD (out of the box) ranks the 4th place (out of 27 supervised/unsupervised systems) in a Fearless Steps Speech Activity Detection Challenge.

The rVAD paper is among the most cited articles from Computer Speech and Language published since 2018 (the 6th place), in 2023.

Usage

The rVADfast library is available as a python package installable via:

pip install rVADfast

After installation, you can import the rVADfast class from which you can instantiate a VAD instance which you can use to generate vad labels:

import audiofile
from rVADfast import rVADfast

vad = rVADfast()

path_to_audiofile = "some_audio_file.wav"

waveform, sampling_rate = audiofile.read(path_to_audiofile)
vad_labels, vad_timestamps = vad(waveform, sampling_rate)

The package also contains functionality to process folders of audio files, to generate VAD labels or to trim non-speech segments from audio files. This is done by importing the rVADfast.process module which has two methods for processing audio files, namely process.rVADfast_single_process and process.rVADfast_multi_process, with the latter utilizing multiple CPUs for processing. Additionally, a processing script can be called from commandline-tools by executing:

rVADfast_process --root <audio_file_root> --save_folder <path_to_save_files> 
--ext <audio_file_extension> --n_workers <number_of_multiprocessing_workers>

For an explanation of the additional available arguments for the commandline tool you can use:

rVADfast_process --help

In /notebooks a concrete example on how to use the rVADfast package is found.

Note that the package is still in development. Therefore, we welcome any feedback or suggestions for changes and/or additional features.

References

  1. Z.-H. Tan, A.k. Sarkara and N. Dehak, "rVAD: an unsupervised segment-based robust voice activity detection method," Computer Speech and Language, vol. 59, pp. 1-21, 2020.
  2. Z.-H. Tan and B. Lindberg, "Low-complexity variable frame rate analysis for speech recognition and voice activity detection,” IEEE Journal of Selected Topics in Signal Processing, vol. 4, no. 5, pp. 798-807, 2010.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rvadfast-0.0.5.tar.gz (17.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rvadfast-0.0.5-py3-none-any.whl (18.8 kB view details)

Uploaded Python 3

File details

Details for the file rvadfast-0.0.5.tar.gz.

File metadata

  • Download URL: rvadfast-0.0.5.tar.gz
  • Upload date:
  • Size: 17.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for rvadfast-0.0.5.tar.gz
Algorithm Hash digest
SHA256 c3be9924fac0d90052f1e690aabbc1dde793f864e9416229a1e86b9bdb166c9e
MD5 e167b1c49c11f614744870a4e173286c
BLAKE2b-256 7155ced2ac77313f3ebcc4085b8ee3582f9455d83dd58a58dc02d23b8ec98c0f

See more details on using hashes here.

File details

Details for the file rvadfast-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: rvadfast-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 18.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for rvadfast-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 5af6e9a4ec6582b46dbe28732f8af7e4a4a8171d1320dc934ee36164c8b08a5b
MD5 e6b044b62293450dd37ffd233dc94446
BLAKE2b-256 d7864748be99b8a04f5a304807bb0815de999aa8862d443faa9ce7401c0d9780

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page