Python implementation of WORLD vocoder.
Project description
PYTHON WORLD VOCODER:
This is a line-by-line implementation of WORLD vocoder (Matlab, C++) in python. It supports python 3.0 and later.
INSTALLATION
pip install worldvocoder
EXAMPLE
import worldvocoder as wv
import soundfile as sf
import librosa
# read audio
audio, sample_rate = sf.read("some_file.wav")
audio = librosa.to_mono(audio)
# initialize vocoder
vocoder = wv.World()
# encode audio
dat = vocoder.encode(sample_rate, audio, f0_method='harvest')
in which, sample_rate is sampling frequency and audio is the speech/singing signal.
The dat is a dictionary object that contains pitch, magnitude spectrum, and aperiodicity.
We can scale the pitch:
dat = vocoder.scale_pitch(dat, 1.5)
Be careful when you scale the pich because there is upper limit and lower limit.
We can make speech faster or slower:
dat = vocoder.scale_duration(dat, 2)
To resynthesize the audio:
dat = vocoder.decode(dat)
output = dat["out"]
To use d4c_requiem analysis and requiem_synthesis in WORLD version 0.2.2, set the variable is_requiem=True:
# requiem analysis
dat = vocoder.encode(fs, x, f0_method='harvest', is_requiem=True)
To extract log-filterbanks, MCEP-40, VAE-12 as described in the paper Using a Manifold Vocoder for Spectral Voice and Style Conversion, check test/spectralFeatures.py. You need Keras 2.2.4 and TensorFlow 1.14.0 to extract VAE-12.
Check out speech samples
NOTE:
-
The vocoder use pitch-synchronous analysis, the size of each window is determined by fundamental frequency
F0. The centers of the windows are equally spaced with the distance offrame_periodms. -
The Fourier transform size (
fft_size) is determined automatically using sampling frequency and the lowest value of F0f0_floor. When you want to specify your ownfft_size, you have to usef0_floor = 3.0 * fs / fft_size. If you decreasefft_size, thef0_floorincreases. But, a highf0_floormight be not good for the analysis of male voices.
CITATION:
Dinh, T., Kain, A., & Tjaden, K. (2019). Using a manifold vocoder for spectral voice and style conversion. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2019-September, 1388-1392.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for worldvocoder-0.0.5-py3-none-any.whl
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 | df6b147d0e2d45d26ab0c5e52a44154d65a5b2ff8d29d2b3593ebecd3e518879 |
|
| MD5 | f298c378be06bfc47b72ec7a866ebe90 |
|
| BLAKE2b-256 | c154de9ac193992ba965bed8cc611a56d0327919d09b9569bc5290886974334e |