Skip to main content

Fast voice activity detection with Python

Project description

Voice Activity Detection with Python

Installing

pip install vader

Basic usage

import vader

# use your own mono, preferably 16kHz .wav file
filename = "audio.wav"

# returns segments of vocal activity (unit: seconds)
# note: it uses a pre-trained logistic regression by default
segments = vader.vad(filename)

# where to dump audio files
out_folder = "segments"
# write segments into .wav files
vader.vad_to_files(segments, filename, out_folder)

You can also use different pre-trained models by specifying the method parameter

# logistic method
segments = vader.vad(filename, threshold=.1, window=20, method="logistic")

# multi-layer perceptron method
segments = vader.vad(filename, threshold=.1, window=20, method="nn")

# Naive Bayes method
segments = vader.vad(filename, threshold=.5, window=10, method="nb")

The threshold parameter is the ratio of voice frames above which a window of frames is counted as a voiced sample. The window parameter controls the number of frames considered, and thus the length of the voiced samples.

You can also train your own models:

import vader
model = vader.train.logistic_regression(mfccs, activities)
model = vader.train.random_forest_classifier(mfccs, activities)
model = vader.train.NN(mfccs, activities)
model = vader.train.NB(mfccs, activities)

The variable mfccs is a list of varying length mfcc features (num_samples, varying_lengths, 13), while activities is a list of binary vectors whose lengths match those of the mfcc features (num_samples, varying_lengths), equal to 1 when a frame is voiced, and 0 otherwise.

Authors

Maixent Chenebaux

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for vader, version 0.0.2
Filename, size File type Python version Upload date Hashes
Filename, size vader-0.0.2-py3-none-any.whl (45.9 kB) File type Wheel Python version py3 Upload date Hashes View
Filename, size vader-0.0.2.tar.gz (46.7 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page