WavEncoder - PyTorch backed audio encoder
Project description
WavEncoder
WavEncoder is a Python library for encoding audio signal, transforms for audio augmention and training audio classification models with PyTorch backend.
Wav Models to be added
- wav2vec [1]
- wav2vec2 [2]
- SincNet [3]
- PASE [4]
- MockingJay [5]
- RawNet [6]
- CNN-1D
- CNN-LSTM
- CNN-LSTM-Attn
- CNN-Transformer
Check the Demo Colab Notebook.
Installation
Use the package manager pip to install wavencoder.
pip install fairseq
pip install wavencoder
Usage
Import pretrained encoder, baseline models and classifiers
import torch
import wavencoder
x = torch.randn(1, 16000) # [1, 16000]
encoder = wavencoder.models.Wav2Vec(pretrained=True)
z = encoder(x) # [1, 512, 98]
classifier = wavencoder.models.LSTM_Attn_Classifier(512, 64, 2,
return_attn_weights=True,
attn_type='soft')
y_hat, attn_weights = classifier(z) # [1, 2], [1, 98]
Use wavencoder with PyTorch Sequential or class modules
import torch
import torch.nn as nn
import wavencoder
model = nn.Sequential(
wavencoder.models.Wav2Vec(),
wavencoder.models.LSTM_Attn_Classifier(512, 64, 2,
return_attn_weights=True,
attn_type='soft')
)
x = torch.randn(1, 16000) # [1, 16000]
y_hat, attn_weights = model(x) # [1, 2], [1, 98]
import torch
import torch.nn as nn
import wavencoder
class AudioClassifier(nn.Module):
def __init__(self):
super(AudioClassifier, self).__init__()
self.encoder = wavencoder.models.Wav2Vec(pretrained=True)
self.classifier = nn.Linear(512, 2)
def forward(self, x):
z = self.encoder(x)
z = torch.mean(z, dim=2)
out = self.classifier(z)
return out
model = AudioClassifier()
x = torch.randn(1, 16000) # [1, 16000]
y_hat = model(x) # [1, 2]
Train the encoder-classifier models
from wavencoder.models import Wav2Vec, LSTM_Attn_Classifier
from wavencoder.trainer import train, test_evaluate_classifier, test_predict_classifier
model = nn.Sequential(
Wav2Vec(pretrained=False),
LSTM_Attn_Classifier(512, 64, 2)
)
trainloader = ...
valloader = ...
testloader = ...
trained_model, train_dict = train(model, trainloader, valloader, n_epochs=20)
test_prediction_dict = test_predict_classifier(trained_model, testloader)
Add Transforms to your DataLoader for Augmentation/Processing the wav signal
from wavencoder.transforms import Compose, AdditiveNoise, SpeedChange, Clipping, PadCrop, Reverberation
audio, _ = torchaudio.load('test.wav')
transforms = Compose([
AdditiveNoise('path-to-noise', p=0.5, snr_levels=[5, 10, 15]), # add environmental Noise
SpeedChange(factor_range=(-0.5, 0.0)), # change speed of signal
Clipping(), # clip the amplitude of the signal
PadCrop(48000, crop_position='random', pad_position='random') # fix the siz of the signal pad/crop depending on the wav lenght
])
transformed_audio = transforms(audio)
Contributing
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Please make sure to update tests as appropriate.
License
Reference
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
wavencoder-0.1.0.tar.gz
(24.8 kB
view details)
Built Distribution
File details
Details for the file wavencoder-0.1.0.tar.gz
.
File metadata
- Download URL: wavencoder-0.1.0.tar.gz
- Upload date:
- Size: 24.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.50.0 CPython/3.7.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a3b9efd52a1fecc90d57ddefd086f98e36c067061707cd4b9c6bcc010137be1d |
|
MD5 | 53b72559f638532cef4ca2a4b06f6410 |
|
BLAKE2b-256 | b2d76698668d9b9194fb53d422802e1aef0e2a377ba8808ba363e0d7f4033537 |
File details
Details for the file wavencoder-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: wavencoder-0.1.0-py3-none-any.whl
- Upload date:
- Size: 29.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.50.0 CPython/3.7.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4a54c7758734ed9784034e5fefc7e1e2ddfc94f94054595efaf42ec10bf350c0 |
|
MD5 | b7774a71fe0a6a23f1436bfb114e27cd |
|
BLAKE2b-256 | a5dcdd24806aee023d6d679ca96b8aa461dc74a208d464c41c62894331fdb964 |