Skip to main content

Pytorch implementation of neural homomorphic vocoder

Project description

CI pypi

neural-homomorphic-vocoder

A neural vocoder based on source-filter model called neural homomorphic vocoder

Install

$ cd tools
$ make

Usage

Usage for NeuralHomomorphicVocoder class

  • Input
    • x: mel-filterbank
    • cf0: continuous f0
    • uv: u/v symbol
import torch
from nhv import NeuralHomomorphicVocoder

net = NeuralHomomorphicVocoder(
        fs=24000,             # sampling frequency
        fft_size=1024,        # size for impuluse responce of LTV
        hop_size=256,         # hop size in each mel-filterbank frame
        in_channels=80,       # input channels (i.e., dimension of mel-filterbank)
        conv_channels=256,    # channel size of LTV filter
        ltv_out_channels=222, # output size of LTV filter
        kernel_size=3,        # kernel size of LTV filter
        group_size=8,         # group size of LTV filter
        dilation_size=1,      # dilation size of LTV filter
        fmin=80,              # min freq. of melspc calculation
        fmax=7600,            # max freq. of melspc calculation
        roll_size=24,         # roll size to calculate logspc from melspc 
        use_causal=False,     # use causal conv LTV filter
        use_conv_postfilter=False,     # use causal conv postfilter for NHV output
        use_ltv_conv_postfilter=False, # use causal conv postfilter for LTV output 
        use_reference_mag=False,       # use reference logspc calculated from melspc
        use_quefrency_norm=True,       # enable ccep normalized by quefrency index
        scaler_file=None      # internal scaling of melspc 
                              # (Dict -> key="mlfb" = StandardScaler)
)

B, T, D = 3, 100, in_channels   # batch_size, frame_size, n_mels
z = torch.randn(B, 1, T * hop_size)
x = torch.randn(B, T, D)
cf0 = torch.randn(B, T, 1)
uv = torch.randn(B, T, 1)
y = net(z, torch.cat([x, cf0, uv], dim=-1))   # z: (B, 1, T * hop_size), c: (B, D+2, T)
y = net._forward(z, cf0, uv)

Features

  • (2021/05/21): Work well and on training
  • (2021/05/21): Follow same input as ParallelWaveGANGenerater in kan-bayashi/ParallelWaveGAN but with continuous F1 and uv symbols
  • (2021/05/24): Final FIR filter is implemented by 1D causal conv layer
  • (2021/05/24): GAN training is not stable
  • (2021/05/25): Implement reference log magnitude from melspc
  • (2021/05/27): Implement internal scaler and ltv conv postfilter

References

@article{liu20,
  title={Neural Homomorphic Vocoder},
  author={Z.~Liu and K.~Chen and K.~Yu},
  journal={Proc. Interspeech 2020},
  pages={240--244},
  year={2020}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

neural-homomorphic-vocoder-0.0.5.tar.gz (9.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

neural_homomorphic_vocoder-0.0.5-py3-none-any.whl (8.8 kB view details)

Uploaded Python 3

File details

Details for the file neural-homomorphic-vocoder-0.0.5.tar.gz.

File metadata

  • Download URL: neural-homomorphic-vocoder-0.0.5.tar.gz
  • Upload date:
  • Size: 9.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.2.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.9.5

File hashes

Hashes for neural-homomorphic-vocoder-0.0.5.tar.gz
Algorithm Hash digest
SHA256 2380899d5e262ff2c71a482187c85ffa8646739e15807b82382321b9e8319a18
MD5 b05ffe20763d08222f9611f4b5c28d82
BLAKE2b-256 7db90da0b815309bb0b3344a1d121714cb37541f67de7b9ee085607ee174310b

See more details on using hashes here.

File details

Details for the file neural_homomorphic_vocoder-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: neural_homomorphic_vocoder-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 8.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.2.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.9.5

File hashes

Hashes for neural_homomorphic_vocoder-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 9210bc141fd43955ce4b48ef4146c00e5955fbb3b0265306a7a993e5a72c3bcb
MD5 fbb772c76a498f18adc7a831c3570b55
BLAKE2b-256 204b816c87d5580c592eb8ef1a0cb47c25b06c908e7f8a36422f446fe547cba3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page