Python implementation of WORLD vocoder.
Project description
PYTHON WORLD VOCODER:
This is a line-by-line implementation of WORLD vocoder (Matlab, C++) in python. It supports python 3.0 and later.
INSTALLATION
pip install worldvocoder
EXAMPLE
import worldvocoder as wv
import soundfile as sf
import librosa
# read audio
audio, sample_rate = sf.read("some_file.wav")
audio = librosa.to_mono(audio)
# initialize vocoder
vocoder = wv.World()
# encode audio
dat = vocoder.encode(sample_rate, audio, f0_method='harvest')
in which, sample_rate is sampling frequency and audio is the speech/singing signal.
The dat is a dictionary object that contains pitch, magnitude spectrum, and aperiodicity.
We can scale the pitch:
dat = vocoder.scale_pitch(dat, 1.5)
Be careful when you scale the pich because there is upper limit and lower limit.
We can make speech faster or slower:
dat = vocoder.scale_duration(dat, 2)
To resynthesize the audio:
dat = vocoder.decode(dat)
output = dat["out"]
To use d4c_requiem analysis and requiem_synthesis in WORLD version 0.2.2, set the variable is_requiem=True:
# requiem analysis
dat = vocoder.encode(fs, x, f0_method='harvest', is_requiem=True)
To extract log-filterbanks, MCEP-40, VAE-12 as described in the paper Using a Manifold Vocoder for Spectral Voice and Style Conversion, check test/spectralFeatures.py. You need Keras 2.2.4 and TensorFlow 1.14.0 to extract VAE-12.
Check out speech samples
NOTE:
-
The vocoder use pitch-synchronous analysis, the size of each window is determined by fundamental frequency
F0. The centers of the windows are equally spaced with the distance offrame_periodms. -
The Fourier transform size (
fft_size) is determined automatically using sampling frequency and the lowest value of F0f0_floor. When you want to specify your ownfft_size, you have to usef0_floor = 3.0 * fs / fft_size. If you decreasefft_size, thef0_floorincreases. But, a highf0_floormight be not good for the analysis of male voices.
CITATION:
Dinh, T., Kain, A., & Tjaden, K. (2019). Using a manifold vocoder for spectral voice and style conversion. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2019-September, 1388-1392.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file worldvocoder-0.0.5.tar.gz.
File metadata
- Download URL: worldvocoder-0.0.5.tar.gz
- Upload date:
- Size: 31.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9c2748c6bc0be1df04e4a7675805966c8981ce81b863d9b90cb8764a7ad03176
|
|
| MD5 |
9044046d5fbadd8cdb6e3604c6486a0c
|
|
| BLAKE2b-256 |
d4e48336dffb1a26e61d3558a8b9c8120538121089dbd978dbff7806de301d52
|
File details
Details for the file worldvocoder-0.0.5-py3-none-any.whl.
File metadata
- Download URL: worldvocoder-0.0.5-py3-none-any.whl
- Upload date:
- Size: 41.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
df6b147d0e2d45d26ab0c5e52a44154d65a5b2ff8d29d2b3593ebecd3e518879
|
|
| MD5 |
f298c378be06bfc47b72ec7a866ebe90
|
|
| BLAKE2b-256 |
c154de9ac193992ba965bed8cc611a56d0327919d09b9569bc5290886974334e
|