PyTorch implemention of part of librosa functions.
Project description
TorchLibrosa: PyTorch implementation of Librosa
This codebase provides PyTorch implementation of some librosa functions. If users previously used for training cpu-extracted features from librosa, but want to add GPU acceleration during training and evaluation, TorchLibrosa will provide almost identical features to standard torchlibrosa functions (numerical difference less than 1e-5).
Install
$ pip install torchlibrosa
Examples 1
Extract Log mel spectrogram with TorchLibrosa.
import torch
import torchlibrosa as tl
batch_size = 16
sample_rate = 22050
win_length = 2048
hop_length = 512
n_mels = 128
batch_audio = torch.empty(batch_size, sample_rate).uniform_(-1, 1) # (batch_size, sample_rate)
# TorchLibrosa feature extractor the same as librosa.feature.melspectrogram()
feature_extractor = torch.nn.Sequential(
tl.Spectrogram(
hop_length=hop_length,
win_length=win_length,
), tl.LogmelFilterBank(
sr=sample_rate,
n_mels=n_mels,
is_log=False, # Default is true
))
batch_feature = feature_extractor(batch_audio) # (batch_size, 1, time_steps, mel_bins)
Examples 2
Extracting spectrogram, then log mel spectrogram, STFT and ISTFT with TorchLibrosa.
import torch
import torchlibrosa as tl
batch_size = 16
sample_rate = 22050
win_length = 2048
hop_length = 512
n_mels = 128
batch_audio = torch.empty(batch_size, sample_rate).uniform_(-1, 1) # (batch_size, sample_rate)
# Spectrogram
spectrogram_extractor = tl.Spectrogram(n_fft=win_length, hop_length=hop_length)
sp = spectrogram_extractor.forward(batch_audio) # (batch_size, 1, time_steps, freq_bins)
# Log mel spectrogram
logmel_extractor = tl.LogmelFilterBank(sr=sample_rate, n_fft=win_length, n_mels=n_mels)
logmel = logmel_extractor.forward(sp) # (batch_size, 1, time_steps, mel_bins)
# STFT
stft_extractor = tl.STFT(n_fft=win_length, hop_length=hop_length)
(real, imag) = stft_extractor.forward(batch_audio)
# real: (batch_size, 1, time_steps, freq_bins), imag: (batch_size, 1, time_steps, freq_bins) #
# ISTFT
istft_extractor = tl.ISTFT(n_fft=win_length, hop_length=hop_length)
y = istft_extractor.forward(real, imag, length=batch_audio.shape[-1]) # (batch_size, samples_num)
Example 3
Check the compability of TorchLibrosa to Librosa. The numerical difference should be less than 1e-5.
python3 torchlibrosa/stft.py --device='cuda' # --device='cpu' | 'cuda'
Contact
Qiuqiang Kong, qiuqiangkong@gmail.com
Cite
[1] Qiuqiang Kong, Yin Cao, Turab Iqbal, Yuxuan Wang, Wenwu Wang, and Mark D. Plumbley. "PANNs: Large-scale pretrained audio neural networks for audio pattern recognition." IEEE/ACM Transactions on Audio, Speech, and Language Processing 28 (2020): 2880-2894.
External links
Other related repos include:
torchaudio: https://github.com/pytorch/audio
Asteroid-filterbanks: https://github.com/asteroid-team/asteroid-filterbanks
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for torchlibrosa-0.0.9-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 05a5fabf4ac9da8c75af4158af3fdf9ff8ca319010f636316a6f52221dd669ac |
|
MD5 | a3e9fe6024e6c2a741a864195559246f |
|
BLAKE2b-256 | 4f7dfd763af31e656db8b8911bf8b39de5b4a342f04da8d8b58a1b1fb7b768ce |
Hashes for torchlibrosa-0.0.9-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e61c41f1ba8be30594f785deb4d153ed0c6dd281f2b43c6f32cde302909a61ed |
|
MD5 | 3cb15414ca20165193c7b20b8e90e8b3 |
|
BLAKE2b-256 | 52a49bf7c8c24a828af8fa33593f745cc709a6bfa7fa893114df0c29e367e124 |