WavEncoder - PyTorch backed audio encoder

These details have not been verified by PyPI

Project links

Homepage

Project description

PyPI PyPI - Downloads visitors PyPI - Python Version GitHub last commit GitHub code size in bytes GitHub Twitter Follow

WavEncoder

WavEncoder is a Python library for encoding audio signal, transforms for audio augmention and training audio classification models with PyTorch backend.

Package Contents

Layers	Models	Transforms	Trainer and utils
Attention Dot Soft Additive Multiplicative SincNet layer Time Delay Neural Network(TDNN)	PreTrained wav2vec SincNet RawNet Baseline 1DCNN LSTM Classifier LSTM Attention Classifier	Noise(Environmet/Gaussian White Noise) Speed Change PadCrop Clip Reverberation TimeShift TimeMask FrequencyMask	Classification Trainer Classification Testing Download Noise Dataset Download Impulse Response Dataset

Wav Models to be added

wav2vec [1]
wav2vec2 [2]
SincNet [3]
PASE [4]
MockingJay [5]
RawNet [6]
GaborNet [7]
LEAF [8]
CNN-1D
CNN-LSTM
CNN-LSTM-Attn
CNN-Transformer

Check the Demo Colab Notebook.

Installation

Use the package manager pip to install wavencoder.

pip install wavencoder

Usage

Import pretrained encoder, baseline models and classifiers

import torch
import wavencoder

x = torch.randn(1, 16000) # [1, 16000]
encoder = wavencoder.models.Wav2Vec(pretrained=True)
z = encoder(x) # [1, 512, 98]

classifier = wavencoder.models.LSTM_Attn_Classifier(512, 64, 2,                          
                                                    return_attn_weights=True, 
                                                    attn_type='soft')
y_hat, attn_weights = classifier(z) # [1, 2], [1, 98]

Use wavencoder with PyTorch Sequential or class modules

import torch
import torch.nn as nn
import wavencoder

model = nn.Sequential(
        wavencoder.models.Wav2Vec(),
        wavencoder.models.LSTM_Attn_Classifier(512, 64, 2,                          
                                               return_attn_weights=True, 
                                               attn_type='soft')
)

x = torch.randn(1, 16000) # [1, 16000]
y_hat, attn_weights = model(x) # [1, 2], [1, 98]

import torch
import torch.nn as nn
import wavencoder

class AudioClassifier(nn.Module):
    def __init__(self):
        super(AudioClassifier, self).__init__()
        self.encoder = wavencoder.models.Wav2Vec(pretrained=True)
        self.classifier = nn.Linear(512, 2)

    def forward(self, x):
        z = self.encoder(x)
        z = torch.mean(z, dim=2)
        out = self.classifier(z)
        return out

model = AudioClassifier()
x = torch.randn(1, 16000) # [1, 16000]
y_hat = model(x) # [1, 2]

Train the encoder-classifier models

from wavencoder.models import Wav2Vec, LSTM_Attn_Classifier
from wavencoder.trainer import train, test_evaluate_classifier, test_predict_classifier

model = nn.Sequential(
    Wav2Vec(pretrained=False),
    LSTM_Attn_Classifier(512, 64, 2)
)

trainloader = ...
valloader = ...
testloader = ...

trained_model, train_dict = train(model, trainloader, valloader, n_epochs=20)
test_prediction_dict = test_predict_classifier(trained_model, testloader)

Add Transforms to your DataLoader for Augmentation/Processing the wav signal

from wavencoder.transforms import Compose, AdditiveNoise, SpeedChange, Clipping, PadCrop, Reverberation

audio, _ = torchaudio.load('test.wav')

transforms = Compose([
                    AdditiveNoise('path-to-noise-folder', p=0.5, snr_levels=[5, 10, 15], p=0.5), 
                    SpeedChange(factor_range=(-0.5, 0.0), p=0.5), 
                    Clipping(p=0.5),
                    PadCrop(48000, crop_position='random', pad_position='random') 
                    ])

transformed_audio = transforms(audio)

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

License

MIT

Reference

	Paper	Code
[1]	Wav2Vec: Unsupervised Pre-training for Speech Recognition	GitHub
[2]	Wav2vec 2.0: Learning the structure of speech from raw audio	GitHub
[3]	Speaker Recognition from Raw Waveform with SincNet	GitHub
[4]	Learning Problem-agnostic Speech Representations from Multiple Self-supervised Tasks	GitHub
[5]	Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders	GitHub
[6]	Improved RawNet with Feature Map Scaling for Text-independent Speaker Verification using Raw Waveforms	GitHub

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.1.3

Jun 6, 2021

0.1.2

Mar 1, 2021

This version

0.1.1

Feb 26, 2021

0.1.0

Feb 6, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wavencoder-0.1.1.tar.gz (29.6 kB view details)

Uploaded Feb 26, 2021 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

wavencoder-0.1.1-py3-none-any.whl (36.0 kB view details)

Uploaded Feb 26, 2021 Python 3

File details

Details for the file wavencoder-0.1.1.tar.gz.

File metadata

Download URL: wavencoder-0.1.1.tar.gz
Upload date: Feb 26, 2021
Size: 29.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.50.0 CPython/3.7.6

File hashes

Hashes for wavencoder-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`d235906b81633eee2d2837a3b1fb4fc71a7b2aab7928ba8ff7063a362625b439`
MD5	`cc10d567b442f286a5aa2c9c1e7e86be`
BLAKE2b-256	`b936021066810732fe895e868c6aa4d2fb005db82e6a4223cc5d83e469991a04`

See more details on using hashes here.

File details

Details for the file wavencoder-0.1.1-py3-none-any.whl.

File metadata

Download URL: wavencoder-0.1.1-py3-none-any.whl
Upload date: Feb 26, 2021
Size: 36.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.50.0 CPython/3.7.6

File hashes

Hashes for wavencoder-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`667e3aed0207c8f6c79aab18ffce2a3bfc5a9eb6644e4181f4aa34deb3428bbe`
MD5	`a427e2a08a28105a845c4faa28aa5972`
BLAKE2b-256	`079d2212248e952fda4f46aca23062f21c597c1c3c12905df6649ae7360be60c`

See more details on using hashes here.

wavencoder 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

WavEncoder

Package Contents

Wav Models to be added

Installation

Usage

Import pretrained encoder, baseline models and classifiers

Use wavencoder with PyTorch Sequential or class modules

Train the encoder-classifier models

Add Transforms to your DataLoader for Augmentation/Processing the wav signal

Contributing

License

Reference

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes