Skip to main content

Python implementation of WORLD vocoder.

Project description

PYTHON WORLD VOCODER:


This is a line-by-line implementation of WORLD vocoder (Matlab, C++) in python. It supports python 3.0 and later.

INSTALLATION


pip install worldvocoder

EXAMPLE


import worldvocoder as wv
import soundfile as sf
import librosa

# read audio
audio, sample_rate = sf.read("some_file.wav")
audio = librosa.to_mono(audio)

# initialize vocoder
vocoder = wv.World()

# encode audio
dat = vocoder.encode(sample_rate, audio, f0_method='harvest')

in which, sample_rate is sampling frequency and audio is the speech/singing signal.

The dat is a dictionary object that contains pitch, magnitude spectrum, and aperiodicity.

We can scale the pitch:

dat = vocoder.scale_pitch(dat, 1.5)

Be careful when you scale the pich because there is upper limit and lower limit.

We can make speech faster or slower:

dat = vocoder.scale_duration(dat, 2)

To resynthesize the audio:

dat = vocoder.decode(dat)
output = dat["out"]

To use d4c_requiem analysis and requiem_synthesis in WORLD version 0.2.2, set the variable is_requiem=True:

# requiem analysis
dat = vocoder.encode(fs, x, f0_method='harvest', is_requiem=True)

To extract log-filterbanks, MCEP-40, VAE-12 as described in the paper Using a Manifold Vocoder for Spectral Voice and Style Conversion, check test/spectralFeatures.py. You need Keras 2.2.4 and TensorFlow 1.14.0 to extract VAE-12. Check out speech samples

NOTE:


  • The vocoder use pitch-synchronous analysis, the size of each window is determined by fundamental frequency F0. The centers of the windows are equally spaced with the distance of frame_period ms.

  • The Fourier transform size (fft_size) is determined automatically using sampling frequency and the lowest value of F0 f0_floor. When you want to specify your own fft_size, you have to use f0_floor = 3.0 * fs / fft_size. If you decrease fft_size, the f0_floor increases. But, a high f0_floor might be not good for the analysis of male voices.

CITATION:

Dinh, T., Kain, A., & Tjaden, K. (2019). Using a manifold vocoder for spectral voice and style conversion. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2019-September, 1388-1392.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

worldvocoder-0.0.5.tar.gz (31.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

worldvocoder-0.0.5-py3-none-any.whl (41.2 kB view details)

Uploaded Python 3

File details

Details for the file worldvocoder-0.0.5.tar.gz.

File metadata

  • Download URL: worldvocoder-0.0.5.tar.gz
  • Upload date:
  • Size: 31.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.0

File hashes

Hashes for worldvocoder-0.0.5.tar.gz
Algorithm Hash digest
SHA256 9c2748c6bc0be1df04e4a7675805966c8981ce81b863d9b90cb8764a7ad03176
MD5 9044046d5fbadd8cdb6e3604c6486a0c
BLAKE2b-256 d4e48336dffb1a26e61d3558a8b9c8120538121089dbd978dbff7806de301d52

See more details on using hashes here.

File details

Details for the file worldvocoder-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: worldvocoder-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 41.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.0

File hashes

Hashes for worldvocoder-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 df6b147d0e2d45d26ab0c5e52a44154d65a5b2ff8d29d2b3593ebecd3e518879
MD5 f298c378be06bfc47b72ec7a866ebe90
BLAKE2b-256 c154de9ac193992ba965bed8cc611a56d0327919d09b9569bc5290886974334e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page