Convenience functions for generating ML features from audio data
Project description
Audio ML Spec Tools
Convenience functions for generating ML features from audio data. Breaks audio ML dependencies on torchaudio. Unlike pytorch features, these functions can be exported to ExecuTorch and ONNX with no issues.
Motivation
Except in specific circumstances like wav2vec, raw audio has proven to be a much worse input for ML models than spectrogram-based features across a wide variety of problem domains, including environmental sound classificarion (Guzhov et al. (2021)), singing technique classification (Yamamoto et al. (2021)), and ship classification (Xie, Ren, and Xu (2024)).
There is no scientific consensus on the relative benefits of mel-scale spectrograms, linear spectrograms, and MFCCs. Different researchers have shown good results with each type of spectrogram; see respectively Raponi, Oligeri, and Ali (2021), Jung at al. (2021), and Razani et al (2017).
With this library, you can easily try as many feature extraction methods as you want to see what works for your use case.
Prerequisites
- Python 3.12 runtime
pipfor package installation- Note that
torchcodecdepends on a system installation of FFmpeg
Installation
Install the dependencies into the environment with pip:
pip install -r requirements.txt
Then install the package itself locally:
pip install .
Usage
See examples/features.py.
Testing
python3 -m coverage run -m unittest discover -s test -p "*_test.py" && python -m coverage report --skip-covered
python -m coverage html
Versioning
We use SemVer for versioning. For the versions available, see the tags on this repository.
Authors
- Ryan Quinn - Initial work
License
MIT.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file audiomlspectools-0.5.0.tar.gz.
File metadata
- Download URL: audiomlspectools-0.5.0.tar.gz
- Upload date:
- Size: 12.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
120207a318ab776ea26026d47c29d918530a361c6fe8ab875c6b23e3fe5c5f95
|
|
| MD5 |
0f4731c5430c5898ce363e05b8aafe77
|
|
| BLAKE2b-256 |
cac660e05797b7ba018d77d0c49294ecf35ddbe48d7acfd36f8dc65f41299823
|
File details
Details for the file audiomlspectools-0.5.0-py3-none-any.whl.
File metadata
- Download URL: audiomlspectools-0.5.0-py3-none-any.whl
- Upload date:
- Size: 16.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fed63cf7f30102107605fd8ca05d91d02751d98b4ecd61c6e1aee6cce811f766
|
|
| MD5 |
42ea0efa62168ffbd6be1543cae6f086
|
|
| BLAKE2b-256 |
80f2df406925782bd90272ddedb05d1136bc4ff5ccff40e35dcf349acd0e4e01
|