An interface for neural speech synthesis with Pytorch

These details have not been verified by PyPI

Project links

Homepage

Project description

SpeechInterface

A Speech Interface Toolkit for Neural Speech Synthesis with Pytorch

This repository is made for deploying your neural speech synthesis experiments efficiently. The main feature is defined as:

Matching audio feature parameters and their source codes for using major neural vocoders

They called an interface, which has encode and decode function.

Encode: Convert raw waveform to audio features. (e.g. mel-spectrogram, mfcc ...)

Decode: Reconstruct audio features to raw waveform. (i.e. neural vocoder)

Usage Examples
- Compare experimental results of neural vocoder with others
- Use directly audio features and neural vocoders for neural speech synthesis models

Install

$ pip install speech_interface

Available neural vocoders

Hifi-GAN (Universal v1, VCTK, LJSpeech) : speech_interface.interfaces.hifi_gan.InterfaceHifiGAN
MelGAN (Multi Speaker and LJSpeech from official repository) : speech_interface.interfaces.mel_gan.InterfaceMelGAN
WaveGlow (LJSpeech) (Universal will be added after solving import error) : speech_interface.interfaces.waveglow.InterfaceWaveGlow
Multi-band MelGAN (VCTK, LJSpeech) : speech_interface.interfaces.multiband_mel_gan.InterfaceMultibandMelGAN

Example

Use an interface

import librosa
import torch
from speech_interface.interfaces.hifi_gan import InterfaceHifiGAN

# Make an interface
model_name = 'hifi_gan_v1_universal' 
device = 'cuda'
interface = InterfaceHifiGAN(model_name=model_name, device=device)

wav, sr = librosa.load('/your/wav/form/file/path')

# to pytorch tensor
wav_tensor = torch.from_numpy(wav).unsqueeze(0)  # (1, Tw)

# encode waveform tensor
features = interface.encode(wav_tensor)

# your speech synthesis process ...
# ...

# reconstruct waveform
pred_wav_tensor = interface.decode(features)

Checkout available models and audio parameters

from speech_interface.interfaces.hifi_gan import InterfaceHifiGAN

# available models
print(InterfaceHifiGAN.available_models())

# audio parameters
print(InterfaceHifiGAN.audio_params())

Reference

Hifi-GAN : https://github.com/jik876/hifi-gan
MelGAN : https://github.com/descriptinc/melgan-neurips
WaveGlow : https://github.com/NVIDIA/waveglow
Multi-band MelGAN : https://github.com/kan-bayashi/ParallelWaveGAN

License

This repository is under MIT license.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.0.2.1

Mar 20, 2021

0.0.2

Mar 20, 2021

0.0.1

Mar 19, 2021

0.0.0

Mar 19, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

speech_interface-0.0.2.1.tar.gz (14.6 kB view details)

Uploaded Mar 20, 2021 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

speech_interface-0.0.2.1-py3-none-any.whl (18.8 kB view details)

Uploaded Mar 20, 2021 Python 3

File details

Details for the file speech_interface-0.0.2.1.tar.gz.

File metadata

Download URL: speech_interface-0.0.2.1.tar.gz
Upload date: Mar 20, 2021
Size: 14.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.1 importlib_metadata/3.7.3 pkginfo/1.7.0 requests/2.23.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.7.7

File hashes

Hashes for speech_interface-0.0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`d0d1fb4217af3c072015a14658f010dd428ce93a4cedc49d62466d78a1ff1e13`
MD5	`a655cbb9884c66c5214d669c86025109`
BLAKE2b-256	`bfb37f09a63e0e3bdb6297a10f4ec9db80a58fde9807c7fd05acbd872cedd8f1`

See more details on using hashes here.

File details

Details for the file speech_interface-0.0.2.1-py3-none-any.whl.

File metadata

Download URL: speech_interface-0.0.2.1-py3-none-any.whl
Upload date: Mar 20, 2021
Size: 18.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.1 importlib_metadata/3.7.3 pkginfo/1.7.0 requests/2.23.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.7.7

File hashes

Hashes for speech_interface-0.0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`451558d6aebbbfe765194ed42e9a1b1a5510056dadb869bf1423301c86f40a06`
MD5	`23e272f229f5acea23ebafa6ce66f3b3`
BLAKE2b-256	`250db5723c7550d05e6cd48e5b3fb100efbb47cc5b78ec81d774e30cdf4dcc1c`

See more details on using hashes here.

speech-interface 0.0.2.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

SpeechInterface

Install

Available neural vocoders

Example

Reference

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes