Skip to main content

Pytorch implementation of neural homomorphic vocoder

Project description

CI PyPI version Downloads

neural-homomorphic-vocoder

A neural vocoder based on source-filter model called neural homomorphic vocoder

Install

pip install neural-homomorphic-vocoder

Usage

Usage for NeuralHomomorphicVocoder class

  • Input
    • z: Gaussian noise
    • x: mel-filterbank
    • cf0: continuous f0
    • uv: u/v symbol
import torch
from nhv import NeuralHomomorphicVocoder

net = NeuralHomomorphicVocoder(
        fs=24000,             # sampling frequency
        fft_size=1024,        # size for impuluse responce of LTV
        hop_size=256,         # hop size in each mel-filterbank frame
        in_channels=80,       # input channels (i.e., dimension of mel-filterbank)
        conv_channels=256,    # channel size of LTV filter
        ccep_size=222,        # output ccep size of LTV filter      
        out_channels=1,       # output size of network
        kernel_size=3,        # kernel size of LTV filter
        dilation_size=1,      # dilation size of LTV filter
        group_size=8,         # group size of LTV filter
        fmin=80,              # min freq. for melspc 
        fmax=7600,            # max freq. for melspc (recommend to use full-band)
        roll_size=24,         # frame size to get median to estimate logspc from melspc
        n_ltv_layers=3,       # # layers for LTV ccep generator
        n_postfilter_layers=4,     # # layers for output postfilter 
        n_ltv_postfilter_layers=1, # # layers for LTV postfilter (if ddsconv)
        harmonic_amp=0.1,     # amplitude of sinusoidals
        noise_std=0.03        # standard deviation of Gaussian noise
        use_causal=False,     # use causal conv LTV filter
        use_reference_mag=False,   # use reference logspc calculated from melspc
        use_tanh=False,       # apply tanh to output else linear
        use_uvmask=False,     # apply uv-based mask to harmonic
        use_weight_norm=True, # apply weight norm to conv1d layer
        conv_type="original"  # LTV generator network type ["original", "ddsconv"]
        postfilter_type=None, # postfilter network type ["None", "normal", "ddsconv"]
        ltv_postfilter_type=None,  # LTV postfilter network type \
                                   # ["None", "normal", "ddsconv"]
        ltv_postfilter_kernel_size=128  # kernel_size for LTV postfilter
        scaler_file=None      # path to .pkl for internal scaling of melspc
                              # (dict["mlfb"] = sklearn.preprocessing.StandardScaler)

    conv_type = "original"
    postfilter_type = "ddsconv"
    ltv_postfilter_type = "conv"
    ltv_postfilter_kernel_size = 128
    scaler_file = None


)

B, T, D = 3, 100, in_channels   # batch_size, n_frames, n_mels
z = torch.randn(B, 1, T * hop_size)
x = torch.randn(B, T, D)
cf0 = torch.randn(B, T, 1)
uv = torch.randn(B, T, 1)
y = net(z, torch.cat([x, cf0, uv], dim=-1))  # z: (B, 1, T * hop_size), c: (B, D+2, T)
y = net._forward(z, x, cf0, uv)
y = net.inference(c)  # for evaluation

Features

  • Train using kan-bayashi/ParallelWaveGAN with continuous F0 and uv symbols
  • Support depth-wise separable convolution
  • Support incremental inference

References

@article{liu20,
  title={Neural Homomorphic Vocoder},
  author={Z.~Liu and K.~Chen and K.~Yu},
  journal={Proc. Interspeech 2020},
  pages={240--244},
  year={2020}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

neural-homomorphic-vocoder-0.0.13.tar.gz (17.2 kB view details)

Uploaded Source

Built Distribution

File details

Details for the file neural-homomorphic-vocoder-0.0.13.tar.gz.

File metadata

  • Download URL: neural-homomorphic-vocoder-0.0.13.tar.gz
  • Upload date:
  • Size: 17.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.6

File hashes

Hashes for neural-homomorphic-vocoder-0.0.13.tar.gz
Algorithm Hash digest
SHA256 d84ff4e9756b580eb88ae94259d141960550f10d278a8843cab805889e042754
MD5 d2888d5393f94af9acb85953827d6e26
BLAKE2b-256 8564708c47694412ba2330da902574aa1c3f0178389b7d28c8a6c4dc4dde7100

See more details on using hashes here.

File details

Details for the file neural_homomorphic_vocoder-0.0.13-py3-none-any.whl.

File metadata

  • Download URL: neural_homomorphic_vocoder-0.0.13-py3-none-any.whl
  • Upload date:
  • Size: 14.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.6

File hashes

Hashes for neural_homomorphic_vocoder-0.0.13-py3-none-any.whl
Algorithm Hash digest
SHA256 efca0827a6ed9498326400e2d303a7b7480e07f5c168ea843b4746c12ec150d5
MD5 69a67a9fd9764bdea04a72713b6c54c6
BLAKE2b-256 7454c54011c4d484a6ff4e0885106b1b24e5b70b711609cd89bc257bdec92bfb

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page