Skip to main content

Audio embeddings based on pruned Look, Listen, and Learn (L3) models for the Edge

Project description

edgel3

PyPI MIT license Build Status Coverage Status Documentation Status

Look, Listen, and Learn (L3) [3], a recently proposed state-of-the-art transfer learning technique, helps to train self-supervised deep audio embedding through binary Audio-Visual Correspondence. This embedding can be used to train a variety of downstream audio classification tasks which has limited data. However, with close to 4.7 million parameters, L3-Net is 18 MB in size making it infeasible for small edge devices, such as 'motes' that use a single microcontroller and limited memory to achieve long-lived self-powered operation.

In EdgeL3 [1], we comprehensively explore the feasibility of compressing the L3-Net for mote-scale inference. We used pruning, ablation, and knowledge distillation techniques to show that the originally proposed L3-Net architecture is substantially overparameterized, not only for AVC but for the target task of sound classification as evaluated on two popular downstream datasets, US8K and ESC50. EdgeL3, a 95.45% sparsified version of L3-Net, provides a useful reference model for approximating L3 audio embedding for transfer learning.

edgel3 is an open-source Python library for downloading the sparsified L3 models and computing deep audio embeddings from such models. The sparse audio embedding models provided have been re-trained using two different mechanisms as described in the paper. The code for the model and training implementation can be found here

Download the original L3 model used by EdgeL3 as baseline here. For non-sparse models and embedding, please refer to OpenL3 [2]

Installing EdgeL3

Dependencies

Tensorflow

Install Tensorflow (CPU-only/GPU) variant that best fits your usecase.

On most platforms, either of the following commands should properly install Tensorflow:

pip install tensorflow # CPU-only version
pip install tensorflow-gpu # GPU version

For more detailed information, please consult the Tensorflow installation documentation.

libsndfile

EdgeL3 depends on the pysoundfile module to load audio files, which depends on the non-Python library libsndfile. On Windows and macOS, these will be installed via pip and you can therefore skip this step. However, on Linux this must be installed manually via your platform's package manager. For Debian-based distributions (such as Ubuntu), this can be done by simply running

apt-get install libsndfile1

For more detailed information, please consult the pysoundfile installation documentation.

Installing EdgeL3

The simplest way to install EdgeL3 is by using pip, which will also install the additional required dependencies if needed. To install EdgeL3 using pip, simply run

pip install edgel3

To install the latest version of EdgeL3 from source:

  1. Clone or pull the lastest version:

     git clone https://github.com/ksangeeta2429/edgel3.git
    
  2. Install using pip to handle python dependencies: cd edgel3 pip install -e .

Using EdgeL3

To help you get started with EdgeL3 please see the tutorial and module usage.

References

Please cite the following papers when using EdgeL3 in your work:

[1] EdgeL3: Compressing L3-Net for Mote-Scale Urban Noise Monitoring
Sangeeta Kumari, Dhrubojyoti Roy, Mark Cartwright, Juan Pablo Bello, and Anish Arora.
Parallel AI and Systems for the Edge (PAISE), Rio de Janeiro, Brazil, May 2019.

[2] Look, Listen and Learn More: Design Choices for Deep Audio Embeddings
Jason Cramer, Ho-Hsiang Wu, Justin Salamon, and Juan Pablo Bello.
IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), pages 3852–3856, Brighton, UK, May 2019.

[3] Look, Listen and Learn
Relja Arandjelović and Andrew Zisserman
IEEE International Conference on Computer Vision (ICCV), Venice, Italy, Oct. 2017.

Project details


Release history Release notifications

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for edgel3, version 0.1.0
Filename, size File type Python version Upload date Hashes
Filename, size edgel3-0.1.0.tar.gz (17.9 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page