Skip to main content

WavEncoder - PyTorch backed audio encoder

Project description

PyPI PyPI - Downloads visitors contributions welcome PyPI - Python Version GitHub last commit GitHub code size in bytes GitHub Gitter Twitter Follow

WavEncoder

WavEncoder is a Python library for encoding audio signal, transforms for audio augmention and training audio classification models with PyTorch backend.

Wav Models to be added

  • wav2vec [1]
  • wav2vec2 [2]
  • SincNet [3]
  • PASE [4]
  • MockingJay [5]
  • RawNet [6]
  • CNN-1D
  • CNN-LSTM
  • CNN-LSTM-Attn
  • CNN-Transformer

Check the Demo Colab Notebook.

Installation

Use the package manager pip to install wavencoder.

pip install fairseq
pip install wavencoder

Usage

Import pretrained encoder, baseline models and classifiers

import torch
import wavencoder

x = torch.randn(1, 16000) # [1, 16000]
encoder = wavencoder.models.Wav2Vec(pretrained=True)
z = encoder(x) # [1, 512, 98]

classifier = wavencoder.models.LSTM_Attn_Classifier(512, 64, 2,                          
                                                    return_attn_weights=True, 
                                                    attn_type='soft')
y_hat, attn_weights = classifier(z) # [1, 2], [1, 98]

Use wavencoder with PyTorch Sequential or class modules

import torch
import torch.nn as nn
import wavencoder

model = nn.Sequential(
        wavencoder.models.Wav2Vec(),
        wavencoder.models.LSTM_Attn_Classifier(512, 64, 2,                          
                                               return_attn_weights=True, 
                                               attn_type='soft')
)

x = torch.randn(1, 16000) # [1, 16000]
y_hat, attn_weights = model(x) # [1, 2], [1, 98]
import torch
import torch.nn as nn
import wavencoder

class AudioClassifier(nn.Module):
    def __init__(self):
        super(AudioClassifier, self).__init__()
        self.encoder = wavencoder.models.Wav2Vec(pretrained=True)
        self.classifier = nn.Linear(512, 2)

    def forward(self, x):
        z = self.encoder(x)
        z = torch.mean(z, dim=2)
        out = self.classifier(z)
        return out

model = AudioClassifier()
x = torch.randn(1, 16000) # [1, 16000]
y_hat = model(x) # [1, 2]

Train the encoder-classifier models

from wavencoder.models import Wav2Vec, LSTM_Attn_Classifier
from wavencoder.trainer import train, test_evaluate_classifier, test_predict_classifier

model = nn.Sequential(
    Wav2Vec(pretrained=False),
    LSTM_Attn_Classifier(512, 64, 2)
)

trainloader = ...
valloader = ...
testloader = ...

trained_model, train_dict = train(model, trainloader, valloader, n_epochs=20)
test_prediction_dict = test_predict_classifier(trained_model, testloader)

Add Transforms to your DataLoader for Augmentation/Processing the wav signal

from wavencoder.transforms import Compose, AdditiveNoise, SpeedChange, Clipping, PadCrop, Reverberation

audio, _ = torchaudio.load('test.wav')

transforms = Compose([
                    AdditiveNoise('path-to-noise', p=0.5, snr_levels=[5, 10, 15]), # add environmental Noise
                    SpeedChange(factor_range=(-0.5, 0.0)), # change speed of signal
                    Clipping(), # clip the amplitude of the signal
                    PadCrop(48000, crop_position='random', pad_position='random') # fix the siz of the signal pad/crop depending on the wav lenght
                    ])

transformed_audio = transforms(audio)

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

License

MIT

Reference

Paper Code
[1] Wav2Vec: Unsupervised Pre-training for Speech Recognition GitHub
[2] Wav2vec 2.0: Learning the structure of speech from raw audio GitHub
[3] Speaker Recognition from Raw Waveform with SincNet GitHub
[4] Learning Problem-agnostic Speech Representations from Multiple Self-supervised Tasks GitHub
[5] Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders GitHub
[6] Improved RawNet with Feature Map Scaling for Text-independent Speaker Verification using Raw Waveforms GitHub

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wavencoder-0.1.0.tar.gz (24.8 kB view details)

Uploaded Source

Built Distribution

wavencoder-0.1.0-py3-none-any.whl (29.2 kB view details)

Uploaded Python 3

File details

Details for the file wavencoder-0.1.0.tar.gz.

File metadata

  • Download URL: wavencoder-0.1.0.tar.gz
  • Upload date:
  • Size: 24.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.50.0 CPython/3.7.6

File hashes

Hashes for wavencoder-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a3b9efd52a1fecc90d57ddefd086f98e36c067061707cd4b9c6bcc010137be1d
MD5 53b72559f638532cef4ca2a4b06f6410
BLAKE2b-256 b2d76698668d9b9194fb53d422802e1aef0e2a377ba8808ba363e0d7f4033537

See more details on using hashes here.

File details

Details for the file wavencoder-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: wavencoder-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 29.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.50.0 CPython/3.7.6

File hashes

Hashes for wavencoder-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4a54c7758734ed9784034e5fefc7e1e2ddfc94f94054595efaf42ec10bf350c0
MD5 b7774a71fe0a6a23f1436bfb114e27cd
BLAKE2b-256 a5dcdd24806aee023d6d679ca96b8aa461dc74a208d464c41c62894331fdb964

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page