audio ML for Jax

Project description

audax

Sponsors
About
Installation
Data pipeline
What's available
What's coming up
On contributing
References

About

A home for audio ML in JAX. Has common features, popular learnable frontends, and pretrained supervised and self-supervised models. As opposed to popular frameworks, the objective is not to become an end-to-end, end-all-be-all DL framework, but instead to act as a starting point for doing things the jax way, through reference implementations and recipes, using the jax / flax / optax stack.

PS: I'm quite new to using Jax and it's functional-at-heart design, so I admit the code can be a bit untidy at places. Expect changes, restructuring, and like the official Jax repository itself says, sharp edges!

Installation

pip install audax

To install from the latest source use following command

git clone https://github.com/SarthakYadav/audax.git
cd audax
pip install -r requirements.txt
pip install .

Data pipeline

All training is done on custom TFRecords. Initially tried using tensorflow-datasets, but decided against it.
tfrecords comprise of examples with audio file stored as an encoded PCM_16 flac buffer, label info and duration, resulting in smaller tfrecord files and faster I/O as compared to storing audio as a sequence of floats.
A step-by-step guide to setup data can be found in the recipes/data_prep, including sample script to convert data into tfrecords.
More info could be found in audax.training_utils.data_v2

What's available

Audio feature extraction

At the time of writing, jax.signal does not have a native Short-time Fourier Transform (stft) implementation.

Instead of trying to emulate the scipy.signal implementation that has a lot more bells and whistles and is more feature packed, the stft implementation in audax.core is designed such that it can be build upon to extract spectrogram and melspectrogram features as those found in torchaudio, which are quite popular. The result is a simple implementation of stft, spectrogram and melspectrogram, which are compatible with their torchaudio counterparts, as shown in the figure below.

audax_vs_torchaudio

Currently, spectrogram and melspectrogram features are supported. Visit audax.core.readme for more info.

Apart from features, jax.vmap compatible mixup and SpecAugment (no TimeStretch as of now unfortunately) implementations are also provided.

Network architectures

Several prominent neural network architecture reference implementations are provided, with more to come. The current release has:

Pretrained models can be found in respective recipes, and expect more to be added soon.

Learnable frontends

Two popular learnable feature extraction frontends are available in audax.frontends LEAF [4] and SincNet [5]. Sample recipes, as well as pretrained models (AudioSet for now) can be found in the recipes/leaf.

Self-supervised models

COLA [6] models on AudioSet for various aforementioned architectures can be found in recipes/cola.
A working implementation of SimCLR [7, 8] can be found in recipes/simclr, and pretrained models will be added soon (experiments ongoing!).

What's coming up

Pretrained COLA models and linear probe experiments. (VERY SOON!)
Better documentation and walk-throughs.
Pretrained SimCLR models.
Recipes for Speaker Recognition on VoxCeleb
More AudioSet pretrained checkpoints for architectures already added.
Reference implementations for more neural architectures, esp. Transformer based networks.

On contributing

At the time of writing, I've been the sole person involved in development of this work, and quite frankly, would love to have help!
Happy to hear from open source contributore, both newbies and experienced, about their experience and needs
Always open to hearing about possible ways to clean up/better structure code.

References

[1] He, K., Zhang, X., Ren, S. and Sun, J., 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).
[2] Tan, M. and Le, Q., 2019, May. Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning (pp. 6105-6114). PMLR.
[3] Liu, Z., Mao, H., Wu, C.Y., Feichtenhofer, C., Darrell, T. and Xie, S., 2022. A ConvNet for the 2020s. arXiv preprint arXiv:2201.03545.
[4] Zeghidour, H., Teboul, O., Quitry, F., and Tagliasacchi, M., LEAF: A Learnable Frontend for Audio Classification, In International Conference on Learning Representations, 2021.
[5] Ravanelli, M. and Bengio, Y., 2018, December. Speaker recognition from raw waveform with sincnet. In 2018 IEEE Spoken Language Technology Workshop (SLT) (pp. 1021-1028). IEEE.
[6] Saeed, A., Grangier, D. and Zeghidour, N., 2021, June. Contrastive learning of general-purpose audio representations. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 3875-3879). IEEE.
[7] Chen, T., Kornblith, S., Norouzi, M. and Hinton, G., 2020, November. A simple framework for contrastive learning of visual representations. In International conference on machine learning (pp. 1597-1607). PMLR.

Project details

Release history Release notifications | RSS feed

This version

0.0.3

Feb 28, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

audax-0.0.3.tar.gz (43.0 kB view details)

Uploaded Feb 28, 2022 Source

Built Distribution

audax-0.0.3-py3-none-any.whl (56.1 kB view details)

Uploaded Feb 28, 2022 Python 3

File details

Details for the file audax-0.0.3.tar.gz.

File metadata

Download URL: audax-0.0.3.tar.gz
Upload date: Feb 28, 2022
Size: 43.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.11.2 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.10

File hashes

Hashes for audax-0.0.3.tar.gz
Algorithm	Hash digest
SHA256	`569389e3346b8b8f871ecc03965bf5e57d55e251c25fb86ce5e3e4efd80031b1`
MD5	`f9da04806c24d86c2b4cf8f108d72385`
BLAKE2b-256	`659699c7cb0401a25f103dde38d456e1ee8d40aad99c3249eda5a0d90fd10684`

See more details on using hashes here.

File details

Details for the file audax-0.0.3-py3-none-any.whl.

File metadata

Download URL: audax-0.0.3-py3-none-any.whl
Upload date: Feb 28, 2022
Size: 56.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.11.2 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.10

File hashes

Hashes for audax-0.0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fa033fc638e183216e94a0116160fd10492cd224d106fa1c8741ddab7c87f17d`
MD5	`f7d82dcd6c4c18582c6ccb475e75df0d`
BLAKE2b-256	`5865af4aa52f8c50166c3229bf4562f8b8afe5a2ff2d24e660e4f7bba8e03478`

See more details on using hashes here.

audax 0.0.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

audax

Sponsors

About

Installation

Data pipeline

What's available

Audio feature extraction

Network architectures

Learnable frontends

Self-supervised models

What's coming up

On contributing

References

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes