Keras(Tensorflow) implementations of Automatic Speech Recognition
Project description
DeepAsr
DeepAsr is an open-source & Keras (Tensorflow) implementation of end-to-end Automatic Speech Recognition (ASR) engine and it supports multiple Speech Recognition architectures.
Supported Asr Architectures:
- Baidu's Deep Speech 2
- DeepAsrNetwork1
Using DeepAsr you can:
- perform speech-to-text using pre-trained models
- tune pre-trained models to your needs
- create new models on your own
DeepAsr key features:
- Multi GPU support: You can do much more like distribute the training using the Strategy, or experiment with mixed precision policy.
- CuDNN support: Model using CuDNNLSTM implementation by NVIDIA Developers. CPU devices is also supported.
- DataGenerator: The feature extraction during model training for large the data.
Installation
You can use pip:
pip install deepasr
Getting started
The speech recognition is a tough task. You don't need to know all details to use one of the pretrained models. However it's worth to understand conceptional crucial components:
- Input: Audio files (WAV or FLAC) with mono 16-bit 16 kHz (up to 5 seconds)
- FeaturesExtractor: Convert audio files using MFCC Features or Spectrogram
- Model: CTC model defined in Keras (references: [1], [2])
- Decoder: Greedy or BeamSearch algorithms with the language model support decode a sequence of probabilities using Alphabet
- DataGenerator: Stream data to the model via generator
- Callbacks: Set of functions monitoring the training
import numpy as np
import pandas as pd
import tensorflow as tf
import deepasr as asr
# get CTCPipeline
def get_config(feature_type: str = 'spectrogram', multi_gpu: bool = False):
# audio feature extractor
features_extractor = asr.features.preprocess(feature_type=feature_type, features_num=161,
samplerate=16000,
winlen=0.02,
winstep=0.025,
winfunc=np.hanning)
# input label encoder
alphabet_en = asr.vocab.Alphabet(lang='en')
# training model
model = asr.model.get_deepspeech2(
input_dim=161,
output_dim=29,
is_mixed_precision=True
)
# model optimizer
optimizer = tf.keras.optimizers.Adam(
lr=1e-4,
beta_1=0.9,
beta_2=0.999,
epsilon=1e-8
)
# output label deocder
decoder = asr.decoder.GreedyDecoder()
# decoder = asr.decoder.BeamSearchDecoder(beam_width=100, top_paths=1)
# CTCPipeline
pipeline = asr.pipeline.ctc_pipeline.CTCPipeline(
alphabet=alphabet_en, features_extractor=features_extractor, model=model, optimizer=optimizer, decoder=decoder,
sample_rate=16000, mono=True, multi_gpu=multi_gpu
)
return pipeline
train_data = pd.read_csv('train_data.csv')
pipeline = get_config(feature_type = 'fbank', multi_gpu=False)
# train asr model
history = pipeline.fit(train_dataset=train_data, batch_size=128, epochs=500)
# history = pipeline.fit_generator(train_dataset = train_data, batch_size=32, epochs=500)
pipeline.save('./checkpoint')
Loaded pre-trained model has all components. The prediction can be invoked just by calling pipline.predict().
import pandas as pd
import deepasr as asr
import numpy as np
test_data = pd.read_csv('test_data.csv')
# get testing audio and transcript from dataset
index = np.random.randint(test_data.shape[0])
data = test_data.iloc[index]
test_file = data[0]
test_transcript = data[1]
# Test Audio file
print("Audio File:",test_file)
# Test Transcript
print("Audio Transcript:", test_transcript)
print("Transcript length:",len(test_transcript))
pipeline = asr.pipeline.load('./checkpoint')
print("Prediction", pipeline.predict(test_file))
References
The fundamental repositories:
- Baidu - DeepSpeech2 - A PaddlePaddle implementation of DeepSpeech2 architecture for ASR
- NVIDIA - Toolkit for efficient experimentation with Speech Recognition, Text2Speech and NLP
- TensorFlow - The implementation of DeepSpeech2 model
- Mozilla - DeepSpeech - A TensorFlow implementation of Baidu's DeepSpeech architecture
- Espnet - End-to-End Speech Processing Toolkit
- Automatic Speech Recognition - Distill the Automatic Speech Recognition research
- Python Speech Features - Speech features for ASR including MFCCs and filterbank energies
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file deepasr-0.1.2.tar.gz
.
File metadata
- Download URL: deepasr-0.1.2.tar.gz
- Upload date:
- Size: 34.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.8.1 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4d3184c356aaab968931c22d827e93016de94fcd5099c01cc46746399429987c |
|
MD5 | 8e76276c4cffa7e82fecfb57847a2e8d |
|
BLAKE2b-256 | 28f771ed972937a8860b8812b6aaddb1696bdb435eb6ab816ceeeba9aea5ea5b |