Skip to main content

audio ML for Jax

Project description

audax

Sponsors

This work would not be possible without cloud resources provided by Google's TPU Research Cloud (TRC) program. I also thank the TRC support team for quickly resolving whatever issues I had: you're awesome!

Want to become a sponsor? Feel free to reach out!

About

A home for audio ML in JAX. Has common features, popular learnable frontends, and pretrained supervised and self-supervised models. As opposed to popular frameworks, the objective is not to become an end-to-end, end-all-be-all DL framework, but instead to act as a starting point for doing things the jax way, through reference implementations and recipes, using the jax / flax / optax stack.

PS: I'm quite new to using Jax and it's functional-at-heart design, so I admit the code can be a bit untidy at places. Expect changes, restructuring, and like the official Jax repository itself says, sharp edges!

Installation

pip install audax

To install from the latest source use following command

git clone https://github.com/SarthakYadav/audax.git
cd audax
pip install -r requirements.txt
pip install .

Data pipeline

  • All training is done on custom TFRecords. Initially tried using tensorflow-datasets, but decided against it.
  • tfrecords comprise of examples with audio file stored as an encoded PCM_16 flac buffer, label info and duration, resulting in smaller tfrecord files and faster I/O as compared to storing audio as a sequence of floats.
  • A step-by-step guide to setup data can be found in the recipes/data_prep, including sample script to convert data into tfrecords.
  • More info could be found in audax.training_utils.data_v2

What's available

Audio feature extraction

At the time of writing, jax.signal does not have a native Short-time Fourier Transform (stft) implementation.

Instead of trying to emulate the scipy.signal implementation that has a lot more bells and whistles and is more feature packed, the stft implementation in audax.core is designed such that it can be build upon to extract spectrogram and melspectrogram features as those found in torchaudio, which are quite popular. The result is a simple implementation of stft, spectrogram and melspectrogram, which are compatible with their torchaudio counterparts, as shown in the figure below.

audax_vs_torchaudio

Currently, spectrogram and melspectrogram features are supported. Visit audax.core.readme for more info.

Apart from features, jax.vmap compatible mixup and SpecAugment (no TimeStretch as of now unfortunately) implementations are also provided.

Network architectures

Several prominent neural network architecture reference implementations are provided, with more to come. The current release has:

Pretrained models can be found in respective recipes, and expect more to be added soon.

Learnable frontends

Two popular learnable feature extraction frontends are available in audax.frontends LEAF [4] and SincNet [5]. Sample recipes, as well as pretrained models (AudioSet for now) can be found in the recipes/leaf.

Self-supervised models

  • COLA [6] models on AudioSet for various aforementioned architectures can be found in recipes/cola.
  • A working implementation of SimCLR [7, 8] can be found in recipes/simclr, and pretrained models will be added soon (experiments ongoing!).

What's coming up

  • Pretrained COLA models and linear probe experiments. (VERY SOON!)
  • Better documentation and walk-throughs.
  • Pretrained SimCLR models.
  • Recipes for Speaker Recognition on VoxCeleb
  • More AudioSet pretrained checkpoints for architectures already added.
  • Reference implementations for more neural architectures, esp. Transformer based networks.

On contributing

  • At the time of writing, I've been the sole person involved in development of this work, and quite frankly, would love to have help!
  • Happy to hear from open source contributore, both newbies and experienced, about their experience and needs
  • Always open to hearing about possible ways to clean up/better structure code.

References

[1] He, K., Zhang, X., Ren, S. and Sun, J., 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).
[2] Tan, M. and Le, Q., 2019, May. Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning (pp. 6105-6114). PMLR.
[3] Liu, Z., Mao, H., Wu, C.Y., Feichtenhofer, C., Darrell, T. and Xie, S., 2022. A ConvNet for the 2020s. arXiv preprint arXiv:2201.03545.
[4] Zeghidour, H., Teboul, O., Quitry, F., and Tagliasacchi, M., LEAF: A Learnable Frontend for Audio Classification, In International Conference on Learning Representations, 2021.
[5] Ravanelli, M. and Bengio, Y., 2018, December. Speaker recognition from raw waveform with sincnet. In 2018 IEEE Spoken Language Technology Workshop (SLT) (pp. 1021-1028). IEEE.
[6] Saeed, A., Grangier, D. and Zeghidour, N., 2021, June. Contrastive learning of general-purpose audio representations. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 3875-3879). IEEE.
[7] Chen, T., Kornblith, S., Norouzi, M. and Hinton, G., 2020, November. A simple framework for contrastive learning of visual representations. In International conference on machine learning (pp. 1597-1607). PMLR.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

audax-0.0.3.tar.gz (43.0 kB view details)

Uploaded Source

Built Distribution

audax-0.0.3-py3-none-any.whl (56.1 kB view details)

Uploaded Python 3

File details

Details for the file audax-0.0.3.tar.gz.

File metadata

  • Download URL: audax-0.0.3.tar.gz
  • Upload date:
  • Size: 43.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.11.2 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.10

File hashes

Hashes for audax-0.0.3.tar.gz
Algorithm Hash digest
SHA256 569389e3346b8b8f871ecc03965bf5e57d55e251c25fb86ce5e3e4efd80031b1
MD5 f9da04806c24d86c2b4cf8f108d72385
BLAKE2b-256 659699c7cb0401a25f103dde38d456e1ee8d40aad99c3249eda5a0d90fd10684

See more details on using hashes here.

File details

Details for the file audax-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: audax-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 56.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.11.2 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.10

File hashes

Hashes for audax-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 fa033fc638e183216e94a0116160fd10492cd224d106fa1c8741ddab7c87f17d
MD5 f7d82dcd6c4c18582c6ccb475e75df0d
BLAKE2b-256 5865af4aa52f8c50166c3229bf4562f8b8afe5a2ff2d24e660e4f7bba8e03478

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page