PyTorch implemention of part of librosa functions.

These details have not been verified by PyPI

Project links

Homepage

Project description

Pytorch implementation of librosa

This codebase provides PyTorch implementation of some librosa functions. The functions can run on GPU. For example, users can extract log mel spectrogram on GPU. The numerical difference between this codebase and librosa is less than 1e-6.

Install

$ pip install torchlibrosa

Examples

Here are examples of extracting spectrogram, log mel spectrogram, STFT and ISTFT using torchlibrosa.

import torch
import torchlibrosa as tl

# Data
x = torch.zeros(1, 22050)	# (batch_size, samples_num)

# Spectrogram
spectrogram_extractor = tl.stft.Spectrogram(n_fft=2048, hop_length=512)
sp = spectrogram_extractor.forward(x)	# (batch_size, 1, time_steps, freq_bins)

# Log mel spectrogram
logmel_extractor = tl.stft.LogmelFilterBank(sr=22050, n_fft=2048, n_mels=128)
logmel = logmel_extractor.forward(sp)	# (batch_size, 1, time_steps, freq_bins)

# STFT
stft_extractor = tl.stft.STFT(n_fft=2048, hop_length=512)
(real, imag) = stft_extractor.forward(x)
# real: (batch_size, 1, time_steps, freq_bins), imag: (batch_size, 1, time_steps, freq_bins) #

# ISTFT
istft_extractor = tl.stft.ISTFT(n_fft=2048, hop_length=512)
y = istft_extractor.forward(real, imag, x.shape[-1])	# (batch_size, samples_num)

More examples

python3 torchlibrosa/stft.py

Compability to librosa functions

If one you previously used for training cpu-extracted features from librosa, but want to add GPU acceleration during i.e., evaluation, then note that the following code will provide identical features to standard mel spectrograms:

## Librosa implementation
import torch
import torchlibrosa as tl

sample_rate = 22050
win_length = 2048
hop_length = 512
n_mels = 128

raw_audio = torch.empty(sample_rate).uniform_(-1, 1) #Float32 input with normalized scale (-1, 1)

#Torchlibrosa feature extractor similar to librosa.feature.melspectrogram()
feature_extractor = torch.nn.Sequential(
    tl.stft.Spectrogram(
        hop_length=hop_length,
        win_length=win_length,
    ), tl.stft.LogmelFilterBank(
        sr=sample_rate,
        n_mels=n_mels,
        is_log=False, #Default is true
    ))
feature = feature_extractor(raw_audio.unsqueeze(0)) # Shape is (Batch, 1, T, N_Mels)

Cite

[1] Qiuqiang Kong, Yin Cao, Turab Iqbal, Yuxuan Wang, Wenwu Wang, and Mark D. Plumbley. "PANNs: Large-scale pretrained audio neural networks for audio pattern recognition." IEEE/ACM Transactions on Audio, Speech, and Language Processing 28 (2020): 2880-2894.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.1.0

Feb 21, 2023

0.0.10

Feb 21, 2023

0.0.9

Mar 10, 2021

0.0.8

Feb 27, 2021

0.0.7

Jan 28, 2021

This version

0.0.6

Jan 24, 2021

0.0.5

Nov 28, 2020

0.0.4

Apr 5, 2020

0.0.3

Mar 3, 2020

0.0.2

Mar 1, 2020

0.0.1

Mar 1, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

torchlibrosa-0.0.6.tar.gz (9.5 kB view hashes)

Uploaded Jan 24, 2021 Source

Built Distribution

torchlibrosa-0.0.6-py3-none-any.whl (9.6 kB view hashes)

Uploaded Jan 24, 2021 Python 3

Hashes for torchlibrosa-0.0.6.tar.gz

Hashes for torchlibrosa-0.0.6.tar.gz
Algorithm	Hash digest
SHA256	`d376cdbce08d9bc845cabdb015774144fad533f18764dbe4ed335096f0c0d02c`
MD5	`fc5f4694694e24a33797139002e42ab3`
BLAKE2b-256	`bb24992d7eca16cb0e8af4d57356b0bec51de6b641411eedcc286846b46a89a1`

Hashes for torchlibrosa-0.0.6-py3-none-any.whl

Hashes for torchlibrosa-0.0.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`cee6089036b91d0303e0866a03f6ebd78a98639ee71695b307baf7f9eb7ddfcd`
MD5	`a53ef6bbca619bb8f5784233f80dec81`
BLAKE2b-256	`af635433aabf248d3f044c8242092b974e66e227f3060f486e31314c470f06f6`