Skip to main content

PyTorch implementation of auditory models from the MATLAB Auditory Modeling Toolbox

Project description

torch_amt - PyTorch Auditory Modeling Toolbox

License: GPL v3 Python 3.14+ PyTorch 2.0+ PyPI version Documentation Status

torch_amt - PyTorch Auditory Modeling Toolbox

Differentiable, GPU-accelerated PyTorch implementations of Computational Auditory models from the MATLAB Auditory Modeling Toolbox (AMT).

Built for researchers in psychoacoustics, computational neuroscience, and Audio Deep Learning who need:

  • 🔥 Hardware acceleration for fast batch processing
  • 📊 Differentiable models for gradient-based optimization
  • 🧩 Modular components for custom auditory pipelines
  • 🎓 Scientific adherence matching MATLAB AMT implementations

📦 Installation

From PyPI

pip install torch-amt

From Source

git clone https://github.com/StefanoGiacomelli/torch_amt.git
cd torch_amt
pip install -e .

Requirements

  • Tested w. Python ≥ 3.14

🚀 Quick Start

Complete Auditory Model

import torch
import torch_amt

# Load Dau et al. (1997) model
model = torch_amt.Dau1997(fs=48000)

# Process 1 second of audio
audio = torch.randn(1, 48000)  # (batch, time)
output = model(audio)

print(f"Input: {audio.shape}")
# Input: torch.Size([1, 48000])
print(f"Output: List of {len(output)} frequency channels")
# Output: List of 31 frequency channels
print(f"Each channel shape: {output[0].shape}")
# Each channel shape: torch.Size([1, 8, 48000]) - (batch, modulation_channels, time)

Custom Processing Pipeline

import torch
import torch_amt

# Build custom auditory processing chain
filterbank = torch_amt.GammatoneFilterbank(fs=48000, fc=(80, 8000))
ihc = torch_amt.IHCEnvelope(fs=48000)
adaptation = torch_amt.AdaptLoop(fs=48000)

# Process signal
audio = torch.randn(2, 48000)     # Batch of 2 signals
filtered = filterbank(audio)      # (2, 31, 48000) - 31 frequency channels
envelope = ihc(filtered)          # (2, 31, 48000) - Envelope extraction
adapted = adaptation(envelope)    # (2, 31, 48000) - Temporal adaptation

print(f"Input: {audio.shape}")
# Input: torch.Size([2, 48000])
print(f"After Gammatone filterbank: {filtered.shape}")
# After Gammatone filterbank: torch.Size([2, 31, 48000])
print(f"After IHC envelope: {envelope.shape}")
# After IHC envelope: torch.Size([2, 31, 48000])
print(f"After adaptation: {adapted.shape}")
# After adaptation: torch.Size([2, 31, 48000])

Hardware Acceleration

import torch
import torch_amt

# Check available hardware
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"MPS available: {torch.backends.mps.is_available()}")

# Move model to GPU (CUDA or MPS)
model = torch_amt.Dau1997(fs=48000)

if torch.backends.mps.is_available():
    model = model.to('mps')  # Apple Silicon
    print(f"Using device: mps")
elif torch.cuda.is_available():
    model = model.cuda()  # NVIDIA GPU
    print(f"Using device: cuda")
else:
    print(f"Using device: cpu")

# Process on accelerated hardware
audio = torch.randn(8, 48000).to(model.gammatone_fb.fc.device)
output = model(audio)

Learnable Models for Neural Networks

import torch
import torch.nn as nn
import torch_amt

class AudioClassifier(nn.Module):
    def __init__(self):
        super().__init__()
        # Learnable auditory front-end
        self.auditory = torch_amt.King2019(fs=48000, learnable=True)
        self.classifier = nn.Linear(155, 10)  # 31 freqs × 5 mods = 155 → 10 classes
    
    def forward(self, audio):
        features = self.auditory(audio)     # (B, T, F, M) e.g., (4, 24000, 31, 5)
        pooled = features.mean(dim=1)       # (B, F, M) e.g., (4, 31, 5) - Pool over time
        flattened = pooled.flatten(1)       # (B, F×M) e.g., (4, 155)
        return self.classifier(flattened)   # (B, 10)

# Train end-to-end with backpropagation
model = AudioClassifier()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-1)

# Example forward pass
audio = torch.randn(4, 24000)  # Batch of 4 signals, 0.5 seconds @ 48kHz
logits = model(audio)  # (4, 10)
print(f"Input: {audio.shape} → Output: {logits.shape}")
# Input: torch.Size([4, 24000]) → Output: torch.Size([4, 10])

📚 Available Models

Model Year Key Features Use Cases
Dau1997 1997 Adaptation loops, modulation filterbank AM detection, temporal processing
Glasberg2002 2002 Specific loudness, temporal integration Loudness perception, hearing aids
Moore2016 2016 Binaural processing, spatial smoothing Binaural loudness, spatial hearing
King2019 2019 Broken-stick compression, FM/AM analysis FM masking, modulation interactions
Osses2021 2021 Extended temporal integration Speech perception, temporal resolution
Paulick2024 2024 Physiological IHC, CASP framework Physiological modeling, cochlear implants

📖 Documentation


📊 Performance

TODO: Placeholder table w. Components runtime analysis forward over 10 runs, and final forward+backward (on CPU, GPU, MPS)


📄 License

This project is aligned to the original AMT license, hence licensed under the GNU General Public License v3.0 or later (GPLv3+). See LICENSE for full details.


🙏 Acknowledgments

This work is based on the Auditory Modeling Toolbox (AMT) developed by:

  • Piotr Majdak
  • Clara Hollomey
  • Robert Baumgartner
  • ...and many contributors from the auditory research community

Reference:

Majdak, P., Hollomey, C., & Baumgartner, R. (2022). "AMT 1.x: A toolbox for reproducible research in auditory modeling." Acta Acustica, 6, 19. https://doi.org/10.1051/aacus/2022011

Official Site: https://amtoolbox.org/

Individual model implementations are based on their respective publications (see docstrings for specific citations).


Contacts

Stefano Giacomelli
ICT - Ph.D. Candidate
Department of Engineering, Information Science & Mathematics (DISIM dpt.) University of L'Aquila, Italy

DISIM_logo

📧 Email: stefano.giacomelli@graduate.univaq.it
🔗 GitHub: https://github.com/StefanoGiacomelli 🆔 ORCID: https://orcid.org/0009-0009-0438-1748 🎓 Scholar: https://scholar.google.com/citations?user=l-n0hl4AAAAJ&hl=it
💼 LinkedIn: https://www.linkedin.com/in/stefano-giacomelli-811654135

This project is funded under the Italian National Ministry of University and Research, for the Italian National Recovery and Resilience Plan (NRRP) "Methods of Computational Auditory Scene Analysis and Synthesis supporting eXtended and Immersive Reality Services"


📝 Citations

If you use torch_amt in your research, please cite:

@software{giacomelli2026torch_amt,
  author = {Giacomelli, Stefano},
  title = {torch\_amt: PyTorch Auditory Modeling Toolbox},
  year = {2026},
  url = {https://github.com/StefanoGiacomelli/torch_amt},
  version = {0.1.0}
}

Also consider citing the original AMT papers.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

torch_amt-0.2.0.tar.gz (219.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

torch_amt-0.2.0-py3-none-any.whl (233.8 kB view details)

Uploaded Python 3

File details

Details for the file torch_amt-0.2.0.tar.gz.

File metadata

  • Download URL: torch_amt-0.2.0.tar.gz
  • Upload date:
  • Size: 219.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for torch_amt-0.2.0.tar.gz
Algorithm Hash digest
SHA256 7ce7a4e5604f1e95ef88229844c589be883430fad6b9585a4d80884b2f4958b1
MD5 794b86d2417cbaf73dc24cc2d54a7937
BLAKE2b-256 14c16eb67013595b6847e8fa2790ac89adab1a1f3ed136ff81b776ef93425208

See more details on using hashes here.

File details

Details for the file torch_amt-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: torch_amt-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 233.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for torch_amt-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d36c7b050672c9993df1e62177bc3f98377d41529e33e80f553bec526703f4b9
MD5 9ce9106596dd768123e435c0f22f7a09
BLAKE2b-256 6e135a944ef3664d6dbd77d2940d26dc66c96f4e414dbecdcf22c1262623b8f5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page