Pytorch implementation of neural homomorphic vocoder
Project description
neural-homomorphic-vocoder
A neural vocoder based on source-filter model called neural homomorphic vocoder
Install
$ cd tools
$ make
Usage
Usage for NeuralHomomorphicVocoder class
- Input
- x: mel-filterbank
- cf0: continuous f0
- uv: u/v symbol
import torch
from nhv import NeuralHomomorphicVocoder
net = NeuralHomomorphicVocoder(
fs=24000, # sampling frequency
fft_size=1024, # size for impuluse responce of LTV
hop_size=256, # hop size in each mel-filterbank frame
in_channels=80, # input channels (i.e., dimension of mel-filterbank)
conv_channels=256, # channel size of LTV filter
ltv_out_channels=222, # output size of LTV filter
kernel_size=3, # kernel size of LTV filter
group_size=8, # group size of LTV filter
dilation_size=1, # dilation size of LTV filter
fmin=80, # min freq. of melspc calculation
fmax=7600, # max freq. of melspc calculation
roll_size=24, # roll size to calculate logspc from melspc
use_causal=False, # use causal conv LTV filter
use_conv_postfilter=False, # use causal conv postfilter for NHV output
use_ltv_conv_postfilter=False, # use causal conv postfilter for LTV output
use_reference_mag=False, # use reference logspc calculated from melspc
use_quefrency_norm=True, # enable ccep normalized by quefrency index
scaler_file=None # internal scaling of melspc
# (Dict -> key="mlfb" = StandardScaler)
)
B, T, D = 3, 100, in_channels # batch_size, frame_size, n_mels
z = torch.randn(B, 1, T * hop_size)
x = torch.randn(B, T, D)
cf0 = torch.randn(B, T, 1)
uv = torch.randn(B, T, 1)
y = net(z, torch.cat([x, cf0, uv], dim=-1)) # z: (B, 1, T * hop_size), c: (B, D+2, T)
y = net._forward(z, cf0, uv)
Features
- (2021/05/21): Work well and on training
- (2021/05/21): Follow same input as
ParallelWaveGANGeneraterin kan-bayashi/ParallelWaveGAN but with continuous F1 and uv symbols - (2021/05/24): Final FIR filter is implemented by 1D causal conv layer
- (2021/05/24): GAN training is not stable
- (2021/05/25): Implement reference log magnitude from melspc
- (2021/05/27): Implement internal scaler and ltv conv postfilter
References
@article{liu20,
title={Neural Homomorphic Vocoder},
author={Z.~Liu and K.~Chen and K.~Yu},
journal={Proc. Interspeech 2020},
pages={240--244},
year={2020}
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file neural-homomorphic-vocoder-0.0.5.tar.gz.
File metadata
- Download URL: neural-homomorphic-vocoder-0.0.5.tar.gz
- Upload date:
- Size: 9.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.2.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.9.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2380899d5e262ff2c71a482187c85ffa8646739e15807b82382321b9e8319a18
|
|
| MD5 |
b05ffe20763d08222f9611f4b5c28d82
|
|
| BLAKE2b-256 |
7db90da0b815309bb0b3344a1d121714cb37541f67de7b9ee085607ee174310b
|
File details
Details for the file neural_homomorphic_vocoder-0.0.5-py3-none-any.whl.
File metadata
- Download URL: neural_homomorphic_vocoder-0.0.5-py3-none-any.whl
- Upload date:
- Size: 8.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.2.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.9.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9210bc141fd43955ce4b48ef4146c00e5955fbb3b0265306a7a993e5a72c3bcb
|
|
| MD5 |
fbb772c76a498f18adc7a831c3570b55
|
|
| BLAKE2b-256 |
204b816c87d5580c592eb8ef1a0cb47c25b06c908e7f8a36422f446fe547cba3
|