An interface for neural speech synthesis with Pytorch
Project description
SpeechInterface
A Speech Interface Toolkit for Neural Speech Synthesis with Pytorch
This repository is made for deploying your neural speech synthesis experiments efficiently. The main feature is defined as:
Matching audio feature parameters and their source codes for using major neural vocoders
They called an interface, which has encode and decode function.
Encode: Convert raw waveform to audio features. (e.g. mel-spectrogram, mfcc ...)
Decode: Reconstruct audio features to raw waveform. (i.e. neural vocoder)
- Usage Examples
- Compare experimental results of neural vocoder with others
- Use directly audio features and neural vocoders for neural speech synthesis models
Install
$ pip install speech_interface
Available neural vocoders
- Hifi-GAN (Universal v1, VCTK, LJSpeech) : speech_interface.interfaces.hifi_gan.InterfaceHifiGAN
- MelGAN (Multi Speaker and LJSpeech from official repository) : speech_interface.interfaces.mel_gan.InterfaceMelGAN
- WaveGlow (LJSpeech) (Universal will be added after solving import error) : speech_interface.interfaces.waveglow.InterfaceWaveGlow
- Multi-band MelGAN (VCTK, LJSpeech) : speech_interface.interfaces.multiband_mel_gan.InterfaceMultibandMelGAN
Example
- Use an interface
import librosa
import torch
from speech_interface.interfaces.hifi_gan import InterfaceHifiGAN
# Make an interface
model_name = 'hifi_gan_v1_universal'
device = 'cuda'
interface = InterfaceHifiGAN(model_name=model_name, device=device)
wav, sr = librosa.load('/your/wav/form/file/path')
# to pytorch tensor
wav_tensor = torch.from_numpy(wav).unsqueeze(0) # (1, Tw)
# encode waveform tensor
features = interface.encode(wav_tensor)
# your speech synthesis process ...
# ...
# reconstruct waveform
pred_wav_tensor = interface.decode(features)
- Checkout available models and audio parameters
from speech_interface.interfaces.hifi_gan import InterfaceHifiGAN
# available models
print(InterfaceHifiGAN.available_models())
# audio parameters
print(InterfaceHifiGAN.audio_params())
Reference
- Hifi-GAN : https://github.com/jik876/hifi-gan
- MelGAN : https://github.com/descriptinc/melgan-neurips
- WaveGlow : https://github.com/NVIDIA/waveglow
- Multi-band MelGAN : https://github.com/kan-bayashi/ParallelWaveGAN
License
This repository is under MIT license.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file speech_interface-0.0.2.1.tar.gz
.
File metadata
- Download URL: speech_interface-0.0.2.1.tar.gz
- Upload date:
- Size: 14.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/3.7.3 pkginfo/1.7.0 requests/2.23.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.7.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
d0d1fb4217af3c072015a14658f010dd428ce93a4cedc49d62466d78a1ff1e13
|
|
MD5 |
a655cbb9884c66c5214d669c86025109
|
|
BLAKE2b-256 |
bfb37f09a63e0e3bdb6297a10f4ec9db80a58fde9807c7fd05acbd872cedd8f1
|
File details
Details for the file speech_interface-0.0.2.1-py3-none-any.whl
.
File metadata
- Download URL: speech_interface-0.0.2.1-py3-none-any.whl
- Upload date:
- Size: 18.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/3.7.3 pkginfo/1.7.0 requests/2.23.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.7.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
451558d6aebbbfe765194ed42e9a1b1a5510056dadb869bf1423301c86f40a06
|
|
MD5 |
23e272f229f5acea23ebafa6ce66f3b3
|
|
BLAKE2b-256 |
250db5723c7550d05e6cd48e5b3fb100efbb47cc5b78ec81d774e30cdf4dcc1c
|