Skip to main content

Fast voice activity detection with Python

Project description

Voice Activity Detection with Python

Installing

pip install vader

Basic usage

import vader

# use your own mono, preferably 16kHz .wav file
filename = "audio.wav"

# returns segments of vocal activity (unit: seconds)
# note: it uses a pre-trained NN by default
segments = vader.vad(filename)

# where to dump audio files
out_folder = "segments"
# write segments into .wav files
vader.vad_to_files(segments, filename, out_folder)

You can also use different pre-trained models by specifying the method parameter

# logistic method
segments = vader.vad(filename, threshold=.1, window=20, method="logistic")

# multi-layer perceptron method
segments = vader.vad(filename, threshold=.1, window=20, method="nn")

# Naive Bayes method
segments = vader.vad(filename, threshold=.5, window=10, method="nb")

# Random Forest method
segments = vader.vad(filename, threshold=.5, window=10, method="rf")

The threshold parameter is the ratio of voice frames above which a window of frames is counted as a voiced sample. The window parameter controls the number of frames considered, and thus the length of the voiced samples.

You can also train your own models:

import vader
model = vader.train.logistic_regression(mfccs, activities)
model = vader.train.random_forest_classifier(mfccs, activities)
model = vader.train.NN(mfccs, activities)
model = vader.train.NB(mfccs, activities)

The variable mfccs is a list of varying length mfcc features (num_samples, varying_lengths, 13), while activities is a list of binary vectors whose lengths match those of the mfcc features (num_samples, varying_lengths), equal to 1 when a frame is voiced, and 0 otherwise.

Authors

Maixent Chenebaux

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vader-0.0.3.tar.gz (4.7 MB view details)

Uploaded Source

Built Distribution

vader-0.0.3-py3-none-any.whl (4.8 MB view details)

Uploaded Python 3

File details

Details for the file vader-0.0.3.tar.gz.

File metadata

  • Download URL: vader-0.0.3.tar.gz
  • Upload date:
  • Size: 4.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.9.1

File hashes

Hashes for vader-0.0.3.tar.gz
Algorithm Hash digest
SHA256 821c5cf42771a1d0c26670f0d3410357bb255f7094d777f453439ceb098e5489
MD5 81ac9610ce45e440b565216969862919
BLAKE2b-256 6827f729d755abc0632c1618cb2d00b691a7cc8eda9afe5be3ac5ee0e39e94e9

See more details on using hashes here.

File details

Details for the file vader-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: vader-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 4.8 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.9.1

File hashes

Hashes for vader-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 50e685da8284e9252aa3721ce2135f9e17f8165f7acfe880e24260a0caa00fb5
MD5 45875df7ef4abb76be493e3415ae52a2
BLAKE2b-256 c93256236e6aab0065fd4ac30a7b0f7ecbde4582cb3744c21e3f3844647f230b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page