Fast voice activity detection with Python
Project description
Voice Activity Detection with Python
Installing
pip install vader
Basic usage
import vader
# use your own mono, preferably 16kHz .wav file
filename = "audio.wav"
# returns segments of vocal activity (unit: seconds)
# note: it uses a pre-trained NN by default
segments = vader.vad(filename)
# where to dump audio files
out_folder = "segments"
# write segments into .wav files
vader.vad_to_files(segments, filename, out_folder)
You can also use different pre-trained models by specifying the method parameter
# logistic method
segments = vader.vad(filename, threshold=.1, window=20, method="logistic")
# multi-layer perceptron method
segments = vader.vad(filename, threshold=.1, window=20, method="nn")
# Naive Bayes method
segments = vader.vad(filename, threshold=.5, window=10, method="nb")
# Random Forest method
segments = vader.vad(filename, threshold=.5, window=10, method="rf")
The threshold
parameter is the ratio of voice frames above which a window of frames is counted as a voiced sample. The window
parameter controls the number of frames considered, and thus the length of the voiced samples.
You can also train your own models:
import vader
model = vader.train.logistic_regression(mfccs, activities)
model = vader.train.random_forest_classifier(mfccs, activities)
model = vader.train.NN(mfccs, activities)
model = vader.train.NB(mfccs, activities)
The variable mfccs
is a list of varying length mfcc features (num_samples, varying_lengths, 13), while activities
is a list of binary vectors whose lengths match those of the mfcc features (num_samples, varying_lengths), equal to 1 when a frame is voiced, and 0 otherwise.
Authors
Maixent Chenebaux
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file vader-0.0.3.tar.gz
.
File metadata
- Download URL: vader-0.0.3.tar.gz
- Upload date:
- Size: 4.7 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/3.10.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.9.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 821c5cf42771a1d0c26670f0d3410357bb255f7094d777f453439ceb098e5489 |
|
MD5 | 81ac9610ce45e440b565216969862919 |
|
BLAKE2b-256 | 6827f729d755abc0632c1618cb2d00b691a7cc8eda9afe5be3ac5ee0e39e94e9 |
File details
Details for the file vader-0.0.3-py3-none-any.whl
.
File metadata
- Download URL: vader-0.0.3-py3-none-any.whl
- Upload date:
- Size: 4.8 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/3.10.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.9.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 50e685da8284e9252aa3721ce2135f9e17f8165f7acfe880e24260a0caa00fb5 |
|
MD5 | 45875df7ef4abb76be493e3415ae52a2 |
|
BLAKE2b-256 | c93256236e6aab0065fd4ac30a7b0f7ecbde4582cb3744c21e3f3844647f230b |