Pytorch implementation of neural homomorphic vocoder
Project description
neural-homomorphic-vocoder
A neural vocoder based on source-filter model called neural homomorphic vocoder
Install
pip install neural-homomorphic-vocoder
Usage
Usage for NeuralHomomorphicVocoder class
- Input
- z: Gaussian noise
- x: mel-filterbank
- cf0: continuous f0
- uv: u/v symbol
import torch
from nhv import NeuralHomomorphicVocoder
net = NeuralHomomorphicVocoder(
fs=24000, # sampling frequency
fft_size=1024, # size for impuluse responce of LTV
hop_size=256, # hop size in each mel-filterbank frame
in_channels=80, # input channels (i.e., dimension of mel-filterbank)
conv_channels=256, # channel size of LTV filter
ccep_size=222, # output ccep size of LTV filter
out_channels=1, # output size of network
kernel_size=3, # kernel size of LTV filter
dilation_size=1, # dilation size of LTV filter
group_size=8, # group size of LTV filter
fmin=80, # min freq. for melspc
fmax=7600, # max freq. for melspc (recommend to use full-band)
roll_size=24, # frame size to get median to estimate logspc from melspc
n_ltv_layers=3, # # layers for LTV ccep generator
n_postfilter_layers=4, # # layers for output postfilter
n_ltv_postfilter_layers=1, # # layers for LTV postfilter (if ddsconv)
harmonic_amp=0.1, # amplitude of sinusoidals
noise_std=0.03 # standard deviation of Gaussian noise
use_causal=False, # use causal conv LTV filter
use_reference_mag=False, # use reference logspc calculated from melspc
use_tanh=False, # apply tanh to output else linear
use_uvmask=False, # apply uv-based mask to harmonic
use_weight_norm=True, # apply weight norm to conv1d layer
conv_type="original" # LTV generator network type ["original", "ddsconv"]
postfilter_type=None, # postfilter network type ["None", "normal", "ddsconv"]
ltv_postfilter_type=None, # LTV postfilter network type \
# ["None", "normal", "ddsconv"]
ltv_postfilter_kernel_size=128 # kernel_size for LTV postfilter
scaler_file=None # path to .pkl for internal scaling of melspc
# (dict["mlfb"] = sklearn.preprocessing.StandardScaler)
conv_type = "original"
postfilter_type = "ddsconv"
ltv_postfilter_type = "conv"
ltv_postfilter_kernel_size = 128
scaler_file = None
)
B, T, D = 3, 100, in_channels # batch_size, n_frames, n_mels
z = torch.randn(B, 1, T * hop_size)
x = torch.randn(B, T, D)
cf0 = torch.randn(B, T, 1)
uv = torch.randn(B, T, 1)
y = net(z, torch.cat([x, cf0, uv], dim=-1)) # z: (B, 1, T * hop_size), c: (B, D+2, T)
y = net._forward(z, x, cf0, uv)
y = net.inference(c) # for evaluation
Features
- Train using kan-bayashi/ParallelWaveGAN with continuous F0 and uv symbols
- Support depth-wise separable convolution
- Support incremental inference
References
@article{liu20,
title={Neural Homomorphic Vocoder},
author={Z.~Liu and K.~Chen and K.~Yu},
journal={Proc. Interspeech 2020},
pages={240--244},
year={2020}
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file neural-homomorphic-vocoder-0.0.13.tar.gz
.
File metadata
- Download URL: neural-homomorphic-vocoder-0.0.13.tar.gz
- Upload date:
- Size: 17.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d84ff4e9756b580eb88ae94259d141960550f10d278a8843cab805889e042754 |
|
MD5 | d2888d5393f94af9acb85953827d6e26 |
|
BLAKE2b-256 | 8564708c47694412ba2330da902574aa1c3f0178389b7d28c8a6c4dc4dde7100 |
File details
Details for the file neural_homomorphic_vocoder-0.0.13-py3-none-any.whl
.
File metadata
- Download URL: neural_homomorphic_vocoder-0.0.13-py3-none-any.whl
- Upload date:
- Size: 14.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | efca0827a6ed9498326400e2d303a7b7480e07f5c168ea843b4746c12ec150d5 |
|
MD5 | 69a67a9fd9764bdea04a72713b6c54c6 |
|
BLAKE2b-256 | 7454c54011c4d484a6ff4e0885106b1b24e5b70b711609cd89bc257bdec92bfb |