Skip to main content

PyTorch implementation of auditory models from the MATLAB Auditory Modeling Toolbox

Project description

🔊 torch_amt - PyTorch Auditory Modeling Toolbox

License: GPL v3 Python 3.14+ PyTorch 2.0+

torch_amt - PyTorch Auditory Modeling Toolbox

Differentiable, GPU-accelerated PyTorch implementations of computational auditory models from the Auditory Modeling Toolbox (AMT).

Built for researchers in psychoacoustics, computational neuroscience, and Audio Deep Learning who need:

  • 🔥 Hardware acceleration for fast batch processing
  • 📊 Differentiable models for gradient-based optimization
  • 🧩 Modular components for custom auditory pipelines
  • 🎓 Scientific adherence matching MATLAB AMT implementations

📦 Installation

From PyPI

pip install torch-amt

From Source

git clone https://github.com/StefanoGiacomelli/torch_amt.git
cd torch_amt
pip install -e .

Requirements

  • Tested w. Python ≥ 3.14

🚀 Quick Start

Complete Auditory Model

import torch
import torch_amt

# Load Dau et al. (1997) model
model = torch_amt.Dau1997(fs=48000)

# Process 1 second of audio
audio = torch.randn(1, 48000)  # (batch, time)
output = model(audio)

print(f"Input: {audio.shape}")
# Input: torch.Size([1, 48000])
print(f"Output: List of {len(output)} frequency channels")
# Output: List of 31 frequency channels
print(f"Each channel shape: {output[0].shape}")
# Each channel shape: torch.Size([1, 8, 48000]) - (batch, modulation_channels, time)

Custom Processing Pipeline

import torch
import torch_amt

# Build custom auditory processing chain
filterbank = torch_amt.GammatoneFilterbank(fs=48000, fc=(80, 8000))
ihc = torch_amt.IHCEnvelope(fs=48000)
adaptation = torch_amt.AdaptLoop(fs=48000)

# Process signal
audio = torch.randn(2, 48000)     # Batch of 2 signals
filtered = filterbank(audio)      # (2, 31, 48000) - 31 frequency channels
envelope = ihc(filtered)          # (2, 31, 48000) - Envelope extraction
adapted = adaptation(envelope)    # (2, 31, 48000) - Temporal adaptation

print(f"Input: {audio.shape}")
# Input: torch.Size([2, 48000])
print(f"After Gammatone filterbank: {filtered.shape}")
# After Gammatone filterbank: torch.Size([2, 31, 48000])
print(f"After IHC envelope: {envelope.shape}")
# After IHC envelope: torch.Size([2, 31, 48000])
print(f"After adaptation: {adapted.shape}")
# After adaptation: torch.Size([2, 31, 48000])

Hardware Acceleration

import torch
import torch_amt

# Check available hardware
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"MPS available: {torch.backends.mps.is_available()}")

# Move model to GPU (CUDA or MPS)
model = torch_amt.Dau1997(fs=48000)

if torch.backends.mps.is_available():
    model = model.to('mps')  # Apple Silicon
    print(f"Using device: mps")
elif torch.cuda.is_available():
    model = model.cuda()  # NVIDIA GPU
    print(f"Using device: cuda")
else:
    print(f"Using device: cpu")

# Process on accelerated hardware
audio = torch.randn(8, 48000).to(model.gammatone_fb.fc.device)
output = model(audio)

Learnable Models for Neural Networks

import torch
import torch.nn as nn
import torch_amt

class AudioClassifier(nn.Module):
    def __init__(self):
        super().__init__()
        # Learnable auditory front-end
        self.auditory = torch_amt.King2019(fs=48000, learnable=True)
        self.classifier = nn.Linear(155, 10)  # 31 freqs × 5 mods = 155 → 10 classes
    
    def forward(self, audio):
        features = self.auditory(audio)     # (B, T, F, M) e.g., (4, 24000, 31, 5)
        pooled = features.mean(dim=1)       # (B, F, M) e.g., (4, 31, 5) - Pool over time
        flattened = pooled.flatten(1)       # (B, F×M) e.g., (4, 155)
        return self.classifier(flattened)   # (B, 10)

# Train end-to-end with backpropagation
model = AudioClassifier()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-1)

# Example forward pass
audio = torch.randn(4, 24000)  # Batch of 4 signals, 0.5 seconds @ 48kHz
logits = model(audio)  # (4, 10)
print(f"Input: {audio.shape} → Output: {logits.shape}")
# Input: torch.Size([4, 24000]) → Output: torch.Size([4, 10])

📚 Available Models

Model Year Key Features Use Cases
Dau1997 1997 Adaptation loops, modulation filterbank AM detection, temporal processing
Glasberg2002 2002 Specific loudness, temporal integration Loudness perception, hearing aids
Moore2016 2016 Binaural processing, spatial smoothing Binaural loudness, spatial hearing
King2019 2019 Broken-stick compression, FM/AM analysis FM masking, modulation interactions
Osses2021 2021 Extended temporal integration Speech perception, temporal resolution
Paulick2024 2024 Physiological IHC, CASP framework Physiological modeling, cochlear implants

📖 Documentation

  • API Reference: See docstrings (comprehensive documentation with equations and examples)
  • Documentation: Coming soon on Read the Docs
  • 🤝 Contributing: See DEV templates

📊 Performance

TODO: Placeholder table w. Components runtime analysis forward over 10 runs, and final forward+backward (on CPU, GPU, MPS)


📄 License

This project is aligned to the original AMT license, hence licensed under the GNU General Public License v3.0 or later (GPLv3+).

See LICENSE for full details.


🙏 Acknowledgments

This work is based on the Auditory Modeling Toolbox (AMT) developed by:

  • Piotr Majdak
  • Clara Hollomey
  • Robert Baumgartner
  • ...and many contributors from the auditory research community

Primary Reference:

Majdak, P., Hollomey, C., & Baumgartner, R. (2022). "AMT 1.x: A toolbox for reproducible research in auditory modeling." Acta Acustica, 6, 19. https://doi.org/10.1051/aacus/2022011

Individual model implementations are based on their respective publications (see model docstrings for specific citations).

Official Site: https://amtoolbox.org/


Contacts

Author: Stefano Giacomelli
Affiliation: Ph.D. Candidate @ DISIM Department, University of L'Aquila
Email: stefano.giacomelli@graduate.univaq.it ORCID: https://orcid.org/0009-0009-0438-1748


📝 Citations

If you use torch_amt in your research, please cite:

@software{giacomelli2026torch_amt,
  author = {Giacomelli, Stefano},
  title = {torch\_amt: PyTorch Auditory Modeling Toolbox},
  year = {2026},
  url = {https://github.com/StefanoGiacomelli/torch_amt},
  version = {0.1.0}
}

Also consider citing the original AMT papers.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

torch_amt-0.1.0.tar.gz (218.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

torch_amt-0.1.0-py3-none-any.whl (233.5 kB view details)

Uploaded Python 3

File details

Details for the file torch_amt-0.1.0.tar.gz.

File metadata

  • Download URL: torch_amt-0.1.0.tar.gz
  • Upload date:
  • Size: 218.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for torch_amt-0.1.0.tar.gz
Algorithm Hash digest
SHA256 5a2ccb813b2732bd9d0977173df80c7e4b1bfe92c188489944caca83fa0a3130
MD5 d2441e425498dcb7a5066c4dac125b2e
BLAKE2b-256 3ba357cc84c2702d3996c10ded25ba864fcfcc070c6d3e23e22809560ad4c638

See more details on using hashes here.

File details

Details for the file torch_amt-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: torch_amt-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 233.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for torch_amt-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 664c314dc1fc643eec9c438e4315a41ab144c4b7e9188e9d74c87361932582be
MD5 e0a3671789ea942b2bc541a8da9e3b5a
BLAKE2b-256 30ad34e6fbad93a4e250a3bbede29d6d2aa5303a0d693bd3a62420eb5767149a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page