# Voice Activity Detection with Python

### Installing

pip install vader


### Basic usage

import vader

# use your own mono, preferably 16kHz .wav file
filename = "audio.wav"

# returns segments of vocal activity (unit: seconds)
# note: it uses a pre-trained NN by default

# where to dump audio files
out_folder = "segments"
# write segments into .wav files


You can also use different pre-trained models by specifying the method parameter

# logistic method

# multi-layer perceptron method

# Naive Bayes method

# Random Forest method


The threshold parameter is the ratio of voice frames above which a window of frames is counted as a voiced sample. The window parameter controls the number of frames considered, and thus the length of the voiced samples.

You can also train your own models:

import vader


The variable mfccs is a list of varying length mfcc features (num_samples, varying_lengths, 13), while activities is a list of binary vectors whose lengths match those of the mfcc features (num_samples, varying_lengths), equal to 1 when a frame is voiced, and 0 otherwise.

## Authors

Maixent Chenebaux

