Pytorch implementation of neural homomorphic vocoder
Project description
neural-homomorphic-vocoder
A neural vocoder based on source-filter model called neural homomorphic vocoder
Install
pip install neural-homomorphic-vocoder
Usage
Usage for NeuralHomomorphicVocoder class
- Input
- x: mel-filterbank
- cf0: continuous f0
- uv: u/v symbol
import torch
from nhv import NeuralHomomorphicVocoder
net = NeuralHomomorphicVocoder(
fs=24000, # sampling frequency
fft_size=1024, # size for impuluse responce of LTV
hop_size=256, # hop size in each mel-filterbank frame
in_channels=80, # input channels (i.e., dimension of mel-filterbank)
conv_channels=256, # channel size of LTV filter
ccep_size=222, # output ccep size of LTV filter
out_channels=1, # output size of network
ccep_size=222, # output size of LTV filter
kernel_size=3, # kernel size of LTV filter
dilation_size=1, # dilation size of LTV filter
group_size=8, # group size of LTV filter
fmin=80, # min freq. for melspc
fmax=7600, # max freq. for melspc (recommend to use full-band)
roll_size=24, # frame size to get median to estimate logspc from melspc
look_ahead=32, # # of look_ahead samples (if use_causal=True)
n_ltv_layers=3, # # layers for LTV ccep generator
n_postfilter_layers=4, # # layers for output postfilter
use_causal=False, # use causal conv LTV filter
use_reference_mag=False, # use reference logspc calculated from melspc
use_tanh=False, # apply tanh to output else linear
use_uvmask=False, # apply uv-based mask to harmonic
use_weight_norm=True, # apply weight norm to conv1d layer
conv_type="original" # ltv generator network type ["original", "ddsconv"]
postfilter_type=None, # postfilter network type ["None", "normal", "ddsconv"]
ltv_postfilter_type="conv", # ltv postfilter network type \
# ["None", "normal", "ddsconv"]
scaler_file=None # path to .pkl for internal scaling of melspc
# (dict["mlfb"] = sklearn.preprocessing.StandardScaler)
)
B, T, D = 3, 100, in_channels # batch_size, frame_size, n_mels
z = torch.randn(B, 1, T * hop_size)
x = torch.randn(B, T, D)
cf0 = torch.randn(B, T, 1)
uv = torch.randn(B, T, 1)
y = net(z, torch.cat([x, cf0, uv], dim=-1)) # z: (B, 1, T * hop_size), c: (B, D+2, T)
y = net._forward(z, cf0, uv)
Features
- (2021/05/21): Train using kan-bayashi/ParallelWaveGAN with continuous F1 and uv symbols
- (2021/05/24): Final FIR filter is implemented by 1D causal conv
- (2021/06/17): Implement depth-wise separable convolution
References
@article{liu20,
title={Neural Homomorphic Vocoder},
author={Z.~Liu and K.~Chen and K.~Yu},
journal={Proc. Interspeech 2020},
pages={240--244},
year={2020}
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file neural-homomorphic-vocoder-0.0.8.tar.gz.
File metadata
- Download URL: neural-homomorphic-vocoder-0.0.8.tar.gz
- Upload date:
- Size: 10.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.9.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
57a5c7b04a0bb4f83ecd1916862159254d382bb5eac1a2aad348d6d7246bad7b
|
|
| MD5 |
4feec1a34c6e4187ee7808f34ab14b4c
|
|
| BLAKE2b-256 |
1c94e76818bc8848e65f88828b54cab11533675b69a1fce0006d89a7bd189173
|
File details
Details for the file neural_homomorphic_vocoder-0.0.8-py3-none-any.whl.
File metadata
- Download URL: neural_homomorphic_vocoder-0.0.8-py3-none-any.whl
- Upload date:
- Size: 11.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.9.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7891242ede422b2f11d2309bfb99811b73e80158e0d3339e9b6f48a833e9105b
|
|
| MD5 |
050dc34882e05064e136566e275de15f
|
|
| BLAKE2b-256 |
9d0d24d5985b780b9373bf1830b5c3c9058b32aff92a551ba7dc947a05728a9e
|