Project description

neural-homomorphic-vocoder

A neural vocoder based on source-filter model called neural homomorphic vocoder

Install

$ cd tools
$ make

Usage

Usage for NeuralHomomorphicVocoder class

Input
- x: mel-filterbank
- cf0: continuous f0
- uv: u/v symbol

import torch
from nhv import NeuralHomomorphicVocoder

net = NeuralHomomorphicVocoder(
        fs=24000,             # sampling frequency
        fft_size=1024,        # size for impuluse responce of LTV
        hop_size=256,         # hop size in each mel-filterbank frame
        in_channels=80,       # input channels (i.e., dimension of mel-filterbank)
        conv_channels=256,    # channel size of LTV filter
        ltv_out_channels=222, # output size of LTV filter
        kernel_size=3,        # kernel size of LTV filter
        group_size=8,         # group size of LTV filter
        dilation_size=1,      # dilation size of LTV filter
        fmin=80,              # min freq. of melspc calculation
        fmax=7600,            # max freq. of melspc calculation
        roll_size=24,         # roll size to calculate logspc from melspc 
        use_causal=False,     # use causal conv LTV filter
        use_conv_postfilter=False,     # use causal conv postfilter for NHV output
        use_ltv_conv_postfilter=False, # use causal conv postfilter for LTV output 
        use_reference_mag=False,       # use reference logspc calculated from melspc
        use_quefrency_norm=True,       # enable ccep normalized by quefrency index
        scaler_file=None      # internal scaling of melspc 
                              # (Dict -> key="mlfb" = StandardScaler)
)

B, T, D = 3, 100, in_channels   # batch_size, frame_size, n_mels
z = torch.randn(B, 1, T * hop_size)
x = torch.randn(B, T, D)
cf0 = torch.randn(B, T, 1)
uv = torch.randn(B, T, 1)
y = net(z, torch.cat([x, cf0, uv], dim=-1))   # z: (B, 1, T * hop_size), c: (B, D+2, T)
y = net._forward(z, cf0, uv)

Features

(2021/05/21): Work well and on training
(2021/05/21): Follow same input as ParallelWaveGANGenerater in kan-bayashi/ParallelWaveGAN but with continuous F1 and uv symbols
(2021/05/24): Final FIR filter is implemented by 1D causal conv layer
(2021/05/24): GAN training is not stable
(2021/05/25): Implement reference log magnitude from melspc
(2021/05/27): Implement internal scaler and ltv conv postfilter

References

@article{liu20,
  title={Neural Homomorphic Vocoder},
  author={Z.~Liu and K.~Chen and K.~Yu},
  journal={Proc. Interspeech 2020},
  pages={240--244},
  year={2020}
}

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.0.13

Jul 27, 2021

0.0.12

Jul 26, 2021

0.0.11

Jul 2, 2021

0.0.10

Jul 1, 2021

0.0.8

Jun 21, 2021

0.0.7

Jun 17, 2021

This version

0.0.5

May 27, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

neural-homomorphic-vocoder-0.0.5.tar.gz (9.0 kB view hashes)

Uploaded May 27, 2021 Source

Built Distribution

neural_homomorphic_vocoder-0.0.5-py3-none-any.whl (8.8 kB view hashes)

Uploaded May 27, 2021 Python 3

Hashes for neural-homomorphic-vocoder-0.0.5.tar.gz

Hashes for neural-homomorphic-vocoder-0.0.5.tar.gz
Algorithm	Hash digest
SHA256	`2380899d5e262ff2c71a482187c85ffa8646739e15807b82382321b9e8319a18`
MD5	`b05ffe20763d08222f9611f4b5c28d82`
BLAKE2b-256	`7db90da0b815309bb0b3344a1d121714cb37541f67de7b9ee085607ee174310b`

Hashes for neural_homomorphic_vocoder-0.0.5-py3-none-any.whl

Hashes for neural_homomorphic_vocoder-0.0.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9210bc141fd43955ce4b48ef4146c00e5955fbb3b0265306a7a993e5a72c3bcb`
MD5	`fbb772c76a498f18adc7a831c3570b55`
BLAKE2b-256	`204b816c87d5580c592eb8ef1a0cb47c25b06c908e7f8a36422f446fe547cba3`