automatic-speech-recognition·PyPI

Distill the Automatic Speech Recognition (TensorFlow)

Project description

Automatic Speech Recognition

The project aim is to distill the Automatic Speech Recognition research. At the beginning, you can load a ready-to-use pipeline with a pre-trained model. Benefit from the eager TensorFlow 2.0 and freely monitor model weights, activations or gradients.

import automatic_speech_recognition as asr

file = 'to/test/sample.wav'  # sample rate 16 kHz, and 16 bit depth
sample = asr.utils.read_audio(file)
pipeline = asr.load('deepspeech2', lang='en')
pipeline.model.summary()     # TensorFlow model
sentences = pipeline.predict([sample])

We support english (thanks to Open Seq2Seq). The evaluation results of the English benchmark LibriSpeech dev-clean are in the table. To reference, the DeepSpeech (Mozilla) achieves around 7.5% WER, whereas the state-of-the-art (RWTH Aachen University) equals 2.3% WER (recent evaluation results can be found here). Both of them, use the external language model to boost results. By comparison, humans achieve 5.83% WER here (LibriSpeech dev-clean)

Model Name	Decoder	WER-dev
`deepspeech2`	greedy	6.71

Shortly it turns out that you need to adjust pipeline a little bit. Take a look at the CTC Pipeline. The pipeline is responsible for connecting a neural network model with all non-differential transformations (features extraction or prediction decoding). Pipeline components are independent. You can adjust them to your needs e.g. use more sophisticated feature extraction, different data augmentation, or add the language model decoder (static n-grams or huge transformers). You can do much more like distribute the training using the Strategy, or experiment with mixed precision policy.

import numpy as np
import tensorflow as tf
import automatic_speech_recognition as asr

dataset = asr.dataset.Audio.from_csv('train.csv', batch_size=32)
dev_dataset = asr.dataset.Audio.from_csv('dev.csv', batch_size=32)
alphabet = asr.text.Alphabet(lang='en')
features_extractor = asr.features.FilterBanks(
    features_num=160,
    winlen=0.02,
    winstep=0.01,
    winfunc=np.hanning
)
model = asr.model.get_deepspeech2(
    input_dim=160,
    output_dim=29,
    rnn_units=800,
    is_mixed_precision=False
)
optimizer = tf.optimizers.Adam(
    lr=1e-4,
    beta_1=0.9,
    beta_2=0.999,
    epsilon=1e-8
)
decoder = asr.decoder.GreedyDecoder()
pipeline = asr.pipeline.CTCPipeline(
    alphabet, features_extractor, model, optimizer, decoder
)
pipeline.fit(dataset, dev_dataset, epochs=25)
pipeline.save('/checkpoint')

test_dataset = asr.dataset.Audio.from_csv('test.csv')
wer, cer = asr.evaluate.calculate_error_rates(pipeline, test_dataset)
print(f'WER: {wer}   CER: {cer}')

Installation

You can use pip:

pip install automatic-speech-recognition

Otherwise clone the code and create a new environment via conda:

git clone https://github.com/rolczynski/Automatic-Speech-Recognition.git
conda env create -f=environment.yml     # or use: environment-gpu.yml
conda activate Automatic-Speech-Recognition

References

The fundamental repositories:

Baidu - DeepSpeech2 - A PaddlePaddle implementation of DeepSpeech2 architecture for ASR
NVIDIA - Toolkit for efficient experimentation with Speech Recognition, Text2Speech and NLP
RWTH Aachen University - The RWTH extensible training framework for universal recurrent neural networks
TensorFlow - The implementation of DeepSpeech2 model
Mozilla - DeepSpeech - A TensorFlow implementation of Baidu's DeepSpeech architecture
Espnet - End-to-End Speech Processing Toolkit
Sean Naren - Speech Recognition using DeepSpeech2

Moreover, you can explore the GitHub using key phrases like ASR, DeepSpeech, or Speech-To-Text. The list wer_are_we, an attempt at tracking states of the art, can be helpful too.

Project details

Release history Release notifications | RSS feed

This version

1.0.4

Mar 24, 2020

1.0.2

Jan 2, 2020

1.0.1

Jan 2, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

automatic-speech-recognition-1.0.4.tar.gz (19.0 kB view details)

Uploaded Mar 24, 2020 Source

Built Distribution

automatic_speech_recognition-1.0.4-py3-none-any.whl (40.4 kB view details)

Uploaded Mar 24, 2020 Python 3

File details

Details for the file automatic-speech-recognition-1.0.4.tar.gz.

File metadata

Download URL: automatic-speech-recognition-1.0.4.tar.gz
Upload date: Mar 24, 2020
Size: 19.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/44.0.0 requests-toolbelt/0.9.1 tqdm/4.41.0 CPython/3.7.5

File hashes

Hashes for automatic-speech-recognition-1.0.4.tar.gz
Algorithm	Hash digest
SHA256	`dc1ad8f638e64acf05270d4ae4861cb37e01179b21f38fe1b85aed592f94f228`
MD5	`d98ecc2af5bbdcf7d58518f8c3afc497`
BLAKE2b-256	`ae68deb54f4ee1fc18abffa74626073d29dfca62316edd15b3ba8515e706d568`

See more details on using hashes here.

File details

Details for the file automatic_speech_recognition-1.0.4-py3-none-any.whl.

File metadata

Download URL: automatic_speech_recognition-1.0.4-py3-none-any.whl
Upload date: Mar 24, 2020
Size: 40.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/44.0.0 requests-toolbelt/0.9.1 tqdm/4.41.0 CPython/3.7.5

File hashes

Hashes for automatic_speech_recognition-1.0.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1ae91b8f9ab6d7a7f85d52cc6e1ebefce68af1b14cd010ff8aac4ab13408a5b2`
MD5	`5f0245e548307eccc2da681f4559d442`
BLAKE2b-256	`b215802b59c8c57299bc4827fddcff91b9ecee02a3830feaa3b70dd746760d73`

See more details on using hashes here.

automatic-speech-recognition 1.0.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Automatic Speech Recognition

Installation

References

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes