Skip to main content

An interface for neural speech synthesis with Pytorch

Project description

SpeechInterface

Python 3.6 Hits

A Speech Interface Toolkit for Neural Speech Synthesis with Pytorch

This repository is made for deploying your neural speech synthesis experiments efficiently. The main feature is defined as:

  • Matching audio feature parameters and their source codes for using major neural vocoders

  • They called an interface, which has encode and decode function.

    • Encode: Convert raw waveform to audio features. (e.g. mel-spectrogram, mfcc ...)

    • Decode: Reconstruct audio features to raw waveform. (i.e. neural vocoder)

  • Usage Examples
    • Compare experimental results of neural vocoder with others
    • Use directly audio features and neural vocoders for neural speech synthesis models

Install

$ pip install speech_interface

Available neural vocoders

  1. Hifi-GAN (Universal v1, VCTK, LJSpeech) : speech_interface.interfaces.hifi_gan.InterfaceHifiGAN
  2. MelGAN (Multi Speaker and LJSpeech from official repository) : speech_interface.interfaces.mel_gan.InterfaceMelGAN
  3. WaveGlow (LJSpeech) (Universal will be added after solving import error) : speech_interface.interfaces.waveglow.InterfaceWaveGlow
  4. Multi-band MelGAN (VCTK, LJSpeech) : speech_interface.interfaces.multiband_mel_gan.InterfaceMultibandMelGAN

Example

  • Use an interface
import librosa
import torch
from speech_interface.interfaces.hifi_gan import InterfaceHifiGAN

# Make an interface
model_name = 'hifi_gan_v1_universal' 
device = 'cuda'
interface = InterfaceHifiGAN(model_name=model_name, device=device)

wav, sr = librosa.load('/your/wav/form/file/path')

# to pytorch tensor
wav_tensor = torch.from_numpy(wav).unsqueeze(0)  # (1, Tw)

# encode waveform tensor
features = interface.encode(wav_tensor)

# your speech synthesis process ...
# ...

# reconstruct waveform
pred_wav_tensor = interface.decode(features)
  • Checkout available models and audio parameters
from speech_interface.interfaces.hifi_gan import InterfaceHifiGAN

# available models
print(InterfaceHifiGAN.available_models())

# audio parameters
print(InterfaceHifiGAN.audio_params())

Reference

License

This repository is under MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

speech_interface-0.0.2.1.tar.gz (14.6 kB view details)

Uploaded Source

Built Distribution

speech_interface-0.0.2.1-py3-none-any.whl (18.8 kB view details)

Uploaded Python 3

File details

Details for the file speech_interface-0.0.2.1.tar.gz.

File metadata

  • Download URL: speech_interface-0.0.2.1.tar.gz
  • Upload date:
  • Size: 14.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.7.3 pkginfo/1.7.0 requests/2.23.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.7.7

File hashes

Hashes for speech_interface-0.0.2.1.tar.gz
Algorithm Hash digest
SHA256 d0d1fb4217af3c072015a14658f010dd428ce93a4cedc49d62466d78a1ff1e13
MD5 a655cbb9884c66c5214d669c86025109
BLAKE2b-256 bfb37f09a63e0e3bdb6297a10f4ec9db80a58fde9807c7fd05acbd872cedd8f1

See more details on using hashes here.

File details

Details for the file speech_interface-0.0.2.1-py3-none-any.whl.

File metadata

  • Download URL: speech_interface-0.0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 18.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.7.3 pkginfo/1.7.0 requests/2.23.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.7.7

File hashes

Hashes for speech_interface-0.0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 451558d6aebbbfe765194ed42e9a1b1a5510056dadb869bf1423301c86f40a06
MD5 23e272f229f5acea23ebafa6ce66f3b3
BLAKE2b-256 250db5723c7550d05e6cd48e5b3fb100efbb47cc5b78ec81d774e30cdf4dcc1c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page