Pytorch implementation of neural homomorphic vocoder
Project description
neural-homomorphic-vocoder
A neural vocoder based on source-filter model called neural homomorphic vocoder
Install
pip install neural-homomorphic-vocoder
Usage
Usage for NeuralHomomorphicVocoder class
- Input
- x: mel-filterbank
- cf0: continuous f0
- uv: u/v symbol
import torch
from nhv import NeuralHomomorphicVocoder
net = NeuralHomomorphicVocoder(
fs=24000, # sampling frequency
fft_size=1024, # size for impuluse responce of LTV
hop_size=256, # hop size in each mel-filterbank frame
in_channels=80, # input channels (i.e., dimension of mel-filterbank)
conv_channels=256, # channel size of LTV filter
ltv_out_channels=222, # output size of LTV filter
out_channels=1, # output size of network
kernel_size=3, # kernel size of LTV filter
group_size=8, # group size of LTV filter
dilation_size=1, # dilation size of LTV filter
fmin=80, # min freq. of melspc calculation
fmax=7600, # max freq. of melspc calculation (recommend to use full-band)
roll_size=24, # frame size to get median to estimate logspc from melspc
look_ahead=32, # # of look_ahead samples (if use_causal=True)
use_causal=False, # use causal conv LTV filter
use_ddsconv=False, # use ddsconv instead of normal conv for LTV network
use_tanh=False, # apply tanh to output else linear
use_conv_postfilter=False, # use causal conv postfilter for NHV output
use_ddsconv_pf=True, # use ddsconv postfilter instead of conv1d
use_ltv_conv_postfilter=False, # use causal conv postfilter for LTV output
use_reference_mag=False, # use reference logspc calculated from melspc
use_quefrency_norm=True, # enable ccep normalized by quefrency index
use_weight_norm=False, # apply weight norm to conv1d layer
use_clip_grad_norm=False, # use clip_grad_norm (norm_value=3)
scaler_file=None # path to .pkl for internal scaling of melspc
# (dict["mlfb"] = sklearn.preprocessing.StandardScaler)
)
B, T, D = 3, 100, in_channels # batch_size, frame_size, n_mels
z = torch.randn(B, 1, T * hop_size)
x = torch.randn(B, T, D)
cf0 = torch.randn(B, T, 1)
uv = torch.randn(B, T, 1)
y = net(z, torch.cat([x, cf0, uv], dim=-1)) # z: (B, 1, T * hop_size), c: (B, D+2, T)
y = net._forward(z, cf0, uv)
Features
- (2021/05/21): Train using kan-bayashi/ParallelWaveGAN with continuous F1 and uv symbols
- (2021/05/24): Final FIR filter is implemented by 1D causal conv
- (2021/06/17): Implement depth-wise separable convolution
References
@article{liu20,
title={Neural Homomorphic Vocoder},
author={Z.~Liu and K.~Chen and K.~Yu},
journal={Proc. Interspeech 2020},
pages={240--244},
year={2020}
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file neural-homomorphic-vocoder-0.0.7.tar.gz.
File metadata
- Download URL: neural-homomorphic-vocoder-0.0.7.tar.gz
- Upload date:
- Size: 10.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.9.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6c876a289bde74a2df139d84992cbbccd7c90f34eb41467bd271ad7fec8a3d17
|
|
| MD5 |
1eaffb216459fdcc85a30d33242ff34b
|
|
| BLAKE2b-256 |
f57b23fa51eedde925f84548f09c23784a15b2a4230cb02beee79ef449170afa
|
File details
Details for the file neural_homomorphic_vocoder-0.0.7-py3-none-any.whl.
File metadata
- Download URL: neural_homomorphic_vocoder-0.0.7-py3-none-any.whl
- Upload date:
- Size: 10.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.9.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7198d04d42b9cd7005e7b575cfb6aefbd5bcb60655723e6a38d95c63d5615292
|
|
| MD5 |
7e68947f8a9badb9364b8f453f19f86e
|
|
| BLAKE2b-256 |
b3e5facf43315f37adf0098f5dbac49fc67148703aafcdfdfa7b5570da6bfe7f
|