Skip to main content

Several simple auditory models in JAX, Numpy and Torch

Project description

Python Auditory Toolbox

This is a Python port of (portions of) the Matlab Auditory Toolbox. This package provides code built upon the Numpy, PyTorch, and JAX numerical libraries.

The Python Auditory toolbox includes these functions from the original in Matlab:

  • Patternson-Holdworth ERB (Gammatone) Filter Bank
    • MakeErbFilters
    • ErbFilterBank
  • Correlogram Processing
    • CorrelogramFrame
    • CorrelogramArray
    • CorrelogramPitch
  • Demonstrations
    • MakeVowel
    • FMPoints

This toolbox does not include Lyon's Passive Long-wave Cochlear model as this model has been supersceded by CARFAC.

All functions are available on top of any of these three computational libraries: JAX, NumPy or PyTorch.

This colab provides examples of calling (and testing) this libary using the NumPy functionality.

This toolbox can be used to build biophysically inspired models of the auditory periphery using JAX, PyTorch and NumPy. This can hopefully be used to further develop realistic models with better explanations of what is changing as we optimize to match different psychoacoustic tests. It may further be useful for developing auditory models such as those developed in Sarah Verhulst's (Hearing Technology Lab on GitHub) and Josh McDermott's (Model Metamers Paper) labs.

You can include the python_auditory_toolbox in your work in several ways. Via the Python package installer pip install python_auditory_toolbox

From GitHub at https://github.com/MalcolmSlaney/python_auditory_toolbox

Or see the toolbox in action (with pretty pictures) via Colab at https://colab.research.google.com/drive/1JGm24f1kOBl-EmtscJck58LGgWkfWGO8?usp=sharing

Note

This package includes three different implementations of the auditory toolbox and thus the union of the three different import requirements. Most users will probably be only using one of the three libraries (NumPy, JAX, or PyTorch), will only need to import one of the auditory_toolbox files, and will not need all the prerequisite libraries.

Please cite this work as Malcolm Slaney and Søren Fuglsang, Python Auditory Toolbox, 2023. https://github.com/MalcolmSlaney/python_auditory_toolbox.

Examples

Here are the frequency responses for a 10-channel ERB gammatone filtebank.

Gammatone (ERB) Filter Reponse

Here is an example of a correlogram, here with a number of harmonic examples that demonstrate the correlogram representation. or via YouTube

MFCC (mel-frequency cepstral coefficients) is a classic speech representation that was often used in (pre-DNN) speech recognizers. It converts the original spectrogram, shown here,

Original tapestry spectrogram

into a 40 channel filterbank. And finally into a 13-dimensional cepstral representation.

We can invert these steps to reconstruct the original filterbank representation

Reconstruction of filterbank representation

And then the reconstruct the original spectrogram.

Reconstruction of spectrogram

Note, in particular, the pitch harmonics (the horizontal banding) have been filtered out by the cepstral processing.

Examples: PyTorch

The following code block demonstrates a feature extraction scheme that involves a 64-channel ERB gammatone filterbank. While the NumPy and JAX versions mimic the original Matlab API, the PyTorch version defines a class. The output features are shown below.

import torch
import torchaudio
import matplotlib.pyplot as plt
import auditory_toolbox_torch as pat

class CustomPipeline(torch.nn.Module):
  def __init__(self, sampling_rate: int = 16000) -> None:
    super().__init__()
    self.erbbank = pat.ErbFilterBank(sampling_rate,64,100)
    self.relu1 = torch.nn.ReLU()
    self.avgpool1 = torch.nn.AvgPool1d(80, stride=20)

  def forward(self, x: torch.Tensor) -> torch.Tensor:
    x = self.erbbank(x)
    x = self.relu1(x)
    x = self.avgpool1(x)
    x = torch.pow(x,0.3)
    return x
  
wav, fs = torchaudio.load('./examples/tapestry.wav')

pipeline =  CustomPipeline(fs)
pipeline.to(dtype=torch.float32)

fig = plt.figure()
plt.imshow(pipeline.forward(wav).squeeze(), aspect='auto', cmap='Blues')

Gammatone features

Authors

Malcolm Slaney (malcolm@ieee.org) and Søren A. Fuglsang (sorenaf@drcmr.dk)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

python_auditory_toolbox-1.0.2.tar.gz (39.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

python_auditory_toolbox-1.0.2-py3-none-any.whl (42.6 kB view details)

Uploaded Python 3

File details

Details for the file python_auditory_toolbox-1.0.2.tar.gz.

File metadata

  • Download URL: python_auditory_toolbox-1.0.2.tar.gz
  • Upload date:
  • Size: 39.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for python_auditory_toolbox-1.0.2.tar.gz
Algorithm Hash digest
SHA256 5c2b7fcf0fc22f349478d676f9c774d25452b8aa9586988b0134ecf4bd526ebe
MD5 0d008e65188e900c19d6b507ff53f6c2
BLAKE2b-256 5ece307b91645a904a44dc15037458a052af29f09363eec586e978c95fb37dc5

See more details on using hashes here.

File details

Details for the file python_auditory_toolbox-1.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for python_auditory_toolbox-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 09f7a1580a8c3084d96a33044b1becdda12a8f6b91338056febc79194b7b718e
MD5 0b459bd8f038e02b07df95eff191b626
BLAKE2b-256 5d65b99e903f7cbf27be151fb9156d4637e6c1591bb44fe5f6d233f2982727c3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page