Python interface to the Google WebRTC Voice Activity Detector (VAD)
Project description
py-webrtcvad
This is a python interface to the WebRTC Voice Activity Detector (VAD). It is compatible with Python 2 and Python 3.
A VAD <https://en.wikipedia.org/wiki/Voice_activity_detection> classifies a piece of audio data as being voiced or unvoiced. It can be useful for telephony and speech recognition.
The VAD that Google developed for the WebRTC <https://webrtc.org/> project is reportedly one of the best available, being fast, modern and free.
How to use it
Create a Vad object.:
import webrtcvad vad = webrtcvad.Vad()
Optionally, set its aggressiveness mode, which is an integer between 0 and 3. 0 is the least aggressive about filtering out non-speech, 3 is the most aggressive. (You can also set the mode when you create the VAD, e.g. vad = webrtcvad.Vad(3)):
vad.set_mode(1)
Give it a short segment (“frame”) of audio. The WebRTC VAD only accepts 16-bit mono PCM audio, sampled at 8000, 16000, or 32000 Hz. A frame must be either 10, 20, or 30 ms in duration.:
# Run the VAD on 10 ms of silence. The result should be False. sample_rate = 16000 frame_duration = 10 # ms frame = b'\x00\x00' * (sample_rate * frame_duration / 1000) print 'Contains speech: %s' % (vad.is_voiced(frame, sample_rate)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.