Skip to main content

A Keras(Tensorflow) implementations of Automatic Speech Recognition

Project description

DeepAsr

DeepAsr is an open-source implementation of end-to-end Automatic Speech Recognition (ASR) engine, based on Baidu's Deep Speech 2 paper using Keras (Tensorflow).

Using DeepAsr you can:

  • perform speech-to-text using pre-trained models
  • tune pre-trained models to your needs
  • create new models on your own

DeepAsr key features:

  • Multi GPU support: You can do much more like distribute the training using the Strategy, or experiment with mixed precision policy.
  • CuDNN support: Model using CuDNNLSTM implementation by NVIDIA Developers. CPU devices is also supported.
  • DataGenerator: The feature extraction (on CPU) can be parallel to model training (on GPU).
import numpy as np
import pandas as pd
import tensorflow as tf
import deepasr as asr

def get_config(features, multi_gpu):
    alphabet_en = asr.vocab.Alphabet(lang='en')
    if features == 'fbank':
        features_extractor = asr.features.FilterBanks(features_num=161,
                                                      winlen=0.02,
                                                      winstep=0.01,
                                                      winfunc=np.hanning)
    else:
        features_extractor = asr.features.Spectrogram(
            features_num=161,
            samplerate=16000,
            winlen=0.02,
            winstep=0.01,
            winfunc=np.hanning
        )
    model = asr.model.get_deepspeech2_v1(
        input_dim=161,
        output_dim=29,
        is_mixed_precision=True
        )
    optimizer = tf.keras.optimizers.Adam(
        lr=1e-4,
        beta_1=0.9,
        beta_2=0.999,
        epsilon=1e-8
        )
    decoder = asr.decoder.GreedyDecoder()

    pipeline = asr.pipeline.ctc_pipeline.CTCPipeline(
        alphabet=alphabet_en, features_extractor=features_extractor, model=model, optimizer=optimizer, decoder=decoder,
        sample_rate=16000, mono=True, multi_gpu=multi_gpu
    )
    return pipeline

def run(train_data, test_data, features='fbank', batch_size=32, epochs=10, multi_gpu=True):
    pipeline = get_config(features, multi_gpu)
    history = pipeline.fit_generator(train_data, batch_size=batch_size, epochs=epochs)
    pipeline.save('./checkpoints')
    print("Truth:", test_data['transcripts'].to_list()[0])
    print("Prediction", pipeline.predict(test_data['path'].to_list()[0]))
    return history

train = pd.read_csv('train_data.csv')
test = pd.read_csv('test_data.csv')
run(train, test, features='fbank', batch_size=32, epochs=100, multi_gpu=True)

Installation

You can use pip:

pip install deepspeechasr

Getting started

The speech recognition is a tough task. You don't need to know all details to use one of the pretrained models. However it's worth to understand conceptional crucial components:

  • Input: WAVE files with mono 16-bit 16 kHz (up to 5 seconds)
  • FeaturesExtractor: Convert audio files using MFCC Features or Spectrogram
  • Model: CTC model defined in Keras (references: [1], [2])
  • Decoder: Greedy algorithm with the language model support decode a sequence of probabilities using Alphabet
  • DataGenerator: Stream data to the model via generator
  • Callbacks: Set of functions monitoring the training

Loaded pre-trained model has all components. The prediction can be invoked just by calling pipline.predict().

import pandas as pd
import deepasr as asr
pipeline = asr.pipeline.get_pipeline.load('./checkpoints')
test_data = pd.read_csv('test_data.csv')
print("Truth:", test_data['transcripts'].to_list()[0])
print("Prediction", pipeline.predict(test_data['path'].to_list()[0]))

References

The fundamental repositories:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deepasr-0.0.1.tar.gz (33.6 kB view details)

Uploaded Source

File details

Details for the file deepasr-0.0.1.tar.gz.

File metadata

  • Download URL: deepasr-0.0.1.tar.gz
  • Upload date:
  • Size: 33.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.44.1 CPython/3.7.5

File hashes

Hashes for deepasr-0.0.1.tar.gz
Algorithm Hash digest
SHA256 4643448c802c9a0b8ce3d08b2c23efec8ec3a802bd462e0ea99b9dd738a94297
MD5 48472e8363d66f13152dc19deb5678b4
BLAKE2b-256 d6d6492c6fee87105ec48ae4452c2b6b1416bb78ebb3e18e61eeea0cf5d841d5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page