Skip to main content

WavEncoder - PyTorch backed audio encoder

Project description

PyPI PyPI - Downloads visitors contributions welcome PyPI - Python Version GitHub last commit GitHub code size in bytes GitHub Gitter Twitter Follow

WavEncoder

WavEncoder is a Python library for encoding audio signal, transforms for audio augmention and training audio classification models with PyTorch backend.

Package Contents

Layers Models Transforms Trainer and utils
  • Attention
    • Dot
    • Soft
    • Additive
    • Multiplicative
  • SincNet layer
  • Time Delay Neural Network(TDNN)
  • PreTrained
    • wav2vec
    • SincNet
    • RawNet
  • Baseline
    • 1DCNN
    • LSTM Classifier
    • LSTM Attention Classifier
  • Noise(Environmet/Gaussian White Noise)
  • Speed Change
  • PadCrop
  • Clip
  • Reverberation
  • TimeShift
  • TimeMask
  • FrequencyMask
  • Classification Trainer
  • Classification Testing
  • Download Noise Dataset
  • Download Impulse Response Dataset

Wav Models to be added

  • wav2vec [1]
  • wav2vec2 [2]
  • SincNet [3]
  • PASE [4]
  • MockingJay [5]
  • RawNet [6]
  • GaborNet [7]
  • LEAF [8]
  • CNN-1D
  • CNN-LSTM
  • CNN-LSTM-Attn
  • CNN-Transformer

Check the Demo Colab Notebook.

Installation

Use the package manager pip to install wavencoder.

pip install wavencoder

Usage

Import pretrained encoder, baseline models and classifiers

import torch
import wavencoder

x = torch.randn(1, 16000) # [1, 16000]
encoder = wavencoder.models.Wav2Vec(pretrained=True)
z = encoder(x) # [1, 512, 98]

classifier = wavencoder.models.LSTM_Attn_Classifier(512, 64, 2,                          
                                                    return_attn_weights=True, 
                                                    attn_type='soft')
y_hat, attn_weights = classifier(z) # [1, 2], [1, 98]

Use wavencoder with PyTorch Sequential or class modules

import torch
import torch.nn as nn
import wavencoder

model = nn.Sequential(
        wavencoder.models.Wav2Vec(),
        wavencoder.models.LSTM_Attn_Classifier(512, 64, 2,                          
                                               return_attn_weights=True, 
                                               attn_type='soft')
)

x = torch.randn(1, 16000) # [1, 16000]
y_hat, attn_weights = model(x) # [1, 2], [1, 98]
import torch
import torch.nn as nn
import wavencoder

class AudioClassifier(nn.Module):
    def __init__(self):
        super(AudioClassifier, self).__init__()
        self.encoder = wavencoder.models.Wav2Vec(pretrained=True)
        self.classifier = nn.Linear(512, 2)

    def forward(self, x):
        z = self.encoder(x)
        z = torch.mean(z, dim=2)
        out = self.classifier(z)
        return out

model = AudioClassifier()
x = torch.randn(1, 16000) # [1, 16000]
y_hat = model(x) # [1, 2]

Train the encoder-classifier models

from wavencoder.models import Wav2Vec, LSTM_Attn_Classifier
from wavencoder.trainer import train, test_evaluate_classifier, test_predict_classifier

model = nn.Sequential(
    Wav2Vec(pretrained=False),
    LSTM_Attn_Classifier(512, 64, 2)
)

trainloader = ...
valloader = ...
testloader = ...

trained_model, train_dict = train(model, trainloader, valloader, n_epochs=20)
test_prediction_dict = test_predict_classifier(trained_model, testloader)

Add Transforms to your DataLoader for Augmentation/Processing the wav signal

from wavencoder.transforms import Compose, AdditiveNoise, SpeedChange, Clipping, PadCrop, Reverberation

audio, _ = torchaudio.load('test.wav')

transforms = Compose([
                    AdditiveNoise('path-to-noise-folder', p=0.5, snr_levels=[5, 10, 15], p=0.5), 
                    SpeedChange(factor_range=(-0.5, 0.0), p=0.5), 
                    Clipping(p=0.5),
                    PadCrop(48000, crop_position='random', pad_position='random') 
                    ])

transformed_audio = transforms(audio)

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

License

MIT

Reference

Paper Code
[1] Wav2Vec: Unsupervised Pre-training for Speech Recognition GitHub
[2] Wav2vec 2.0: Learning the structure of speech from raw audio GitHub
[3] Speaker Recognition from Raw Waveform with SincNet GitHub
[4] Learning Problem-agnostic Speech Representations from Multiple Self-supervised Tasks GitHub
[5] Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders GitHub
[6] Improved RawNet with Feature Map Scaling for Text-independent Speaker Verification using Raw Waveforms GitHub

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wavencoder-0.1.1.tar.gz (29.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wavencoder-0.1.1-py3-none-any.whl (36.0 kB view details)

Uploaded Python 3

File details

Details for the file wavencoder-0.1.1.tar.gz.

File metadata

  • Download URL: wavencoder-0.1.1.tar.gz
  • Upload date:
  • Size: 29.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.50.0 CPython/3.7.6

File hashes

Hashes for wavencoder-0.1.1.tar.gz
Algorithm Hash digest
SHA256 d235906b81633eee2d2837a3b1fb4fc71a7b2aab7928ba8ff7063a362625b439
MD5 cc10d567b442f286a5aa2c9c1e7e86be
BLAKE2b-256 b936021066810732fe895e868c6aa4d2fb005db82e6a4223cc5d83e469991a04

See more details on using hashes here.

File details

Details for the file wavencoder-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: wavencoder-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 36.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.50.0 CPython/3.7.6

File hashes

Hashes for wavencoder-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 667e3aed0207c8f6c79aab18ffce2a3bfc5a9eb6644e4181f4aa34deb3428bbe
MD5 a427e2a08a28105a845c4faa28aa5972
BLAKE2b-256 079d2212248e952fda4f46aca23062f21c597c1c3c12905df6649ae7360be60c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page