A librosa's STFT/FBANK/MFCC implement based on Torch
Project description
Librosa STFT/Fbank/MFCC in PyTorch
Author: Shimin Zhang
A librosa STFT/Fbank/mfcc feature extration written up in PyTorch using 1D Convolutions.
Installation
Install easily with pip:pip install torch_mfcc
or download this repo, python setup.py install
.
Usage
If you want the same timesteps as kaldi, make sure that:
- the window length, window hop length and fft length are same.
- set enframed_mode(str)='break', which defaults to 'continue'.
- set center(bool)=False. which defaults to True.
import torch as th
import torch.nn.functional as tf
import scipy
from torch_mfcc import STFT, FBANK, MFCC
import numpy as np
import librosa
def CalMfcc(signal, sr, fft_len, win_hop, win_len,
n_mfcc=13, center=True, dct_type=2, norm='ortho'):
spec_amp, n_fft = librosa.spectrum._spectrogram(signal, n_fft=fft_len, hop_length=win_hop, win_length=win_len,
center=center, power=2.0, window='hann')
mel_basis = librosa.filters.mel(sr, n_fft)
S = np.dot(mel_basis, spec_amp)
fbank = librosa.core.power_to_db(S, top_db=None)
mfcc_spec = scipy.fftpack.dct(
fbank, axis=0, type=dct_type, norm=norm)[:n_mfcc]
return spec_amp, fbank, mfcc_spec
sig = librosa.load(
librosa.util.example_audio_file(), duration=10.0, offset=30)[0]
device='cpu'
sig_th = th.from_numpy(sig)[None, :].float().to(device)
fft_len = 1024
win_hop = 256
win_len = 1024
window = 'hann'
n_mel = 128
n_mfcc = 13
dct_type = 4
norm = 'ortho'
center = False
sr = 22050
spec_amp, fbank_spec, mfcc_spec = CalMfcc(
signal=sig,
sr=sr,
fft_len=fft_len,
win_hop=win_hop,
win_len=win_len,
n_mfcc=n_mfcc,
center=center,
dct_type=dct_type,
norm=norm)
# librosa STFT VS conv_stft
# see more test at :https://github.com/echocatzh/conv-stft/blob/master/tests/test_stft.py
stft = STFT(
win_len=win_len,
win_hop=win_hop,
fft_len=fft_len,
pad_center=center,
win_type=window).to(device)
real, imag = stft.transform(sig_th, return_type='realimag')
spectrum_th = th.square(real) + th.square(imag)
print(tf.mse_loss(th.FloatTensor(spec_amp).to(device),
spectrum_th.squeeze(0))) # 8.4275e-10
# librosa fbank VS torch_fbank
fbank = FBANK(win_len, win_hop, fft_len, sr, win_type=window,
top_db=None,
center=center, n_mel=n_mel).to(device)
fbank_spec_th = fbank(sig_th)
print(tf.mse_loss(th.FloatTensor(fbank_spec).to(device),
fbank_spec_th.transpose(1, 2).squeeze(0))) # 1.4462e-09
# librosa mfcc VS torch_mfcc
mfcc = MFCC(win_len, win_hop, fft_len, sr, win_type=window,
top_db=None, dct_type=dct_type, norm=norm,
center=center, n_mfcc=n_mfcc, n_mel=n_mel).to(device)
mfcc_spec_th = mfcc(sig_th)
print(tf.mse_loss(th.FloatTensor(mfcc_spec).to(device),
mfcc_spec_th.transpose(1, 2).squeeze(0))) # 7.2581e-10
Contact
If you have any questions, welcome to contact me at shmzhang@npu-aslp.org
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
torch_mfcc-0.2.1.tar.gz
(6.3 kB
view details)
File details
Details for the file torch_mfcc-0.2.1.tar.gz
.
File metadata
- Download URL: torch_mfcc-0.2.1.tar.gz
- Upload date:
- Size: 6.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.23.0 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.6.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8f0b9908e4ae4b6d8be91876dbb0ff7294bf0c070329476321956328befba8b4 |
|
MD5 | 7d2569ead5c14c92dedcf0528cfc8ee8 |
|
BLAKE2b-256 | e1fab3a5a03d6f73bc5f7ab5d31c616e421e9fdd07ffcc7360ce0dceb20d3dce |