Skip to main content

Almost State-of-the-art Automatic Speech Recognition using Tensorflow 2

Project description

TensorFlowASR :zap:

GitHub python tensorflow PyPI

Almost State-of-the-art Automatic Speech Recognition in Tensorflow 2

TensorFlowASR implements some automatic speech recognition architectures such as DeepSpeech2, Jasper, RNN Transducer, ContextNet, Conformer, etc. These models can be converted to TFLite to reduce memory and computation for deployment :smile:

What's New?

  • (12/27/2020) Supported naive token level timestamp, see demo with flag --timestamp
  • (12/17/2020) Supported ContextNet http://arxiv.org/abs/2005.03191
  • (12/12/2020) Add support for using masking
  • (11/14/2020) Supported Gradient Accumulation for Training in Larger Batch Size
  • (11/3/2020) Reduce differences between librosa.stft and tf.signal.stft
  • (10/31/2020) Update DeepSpeech2 and Supported Jasper https://arxiv.org/abs/1904.03288
  • (10/18/2020) Supported Streaming Transducer https://arxiv.org/abs/1811.06621

Table of Contents

:yum: Supported Models

Baselines

  • CTCModel (End2end models using CTC Loss for training)
  • Transducer Models (End2end models using RNNT Loss for training)

Publications

Installation

Install tensorflow>=2.3.0 or tf-nightly.

For training and testing, you should use git clone for installing necessary packages from other authors (ctc_decoders, rnnt_loss, etc.)

Installing via PyPi

Run pip3 install -U TensorFlowASR

Installing from source

git clone https://github.com/TensorSpeech/TensorFlowASR.git
cd TensorFlowASR
pip3 install .

For anaconda3:

conda create -y -n tfasr tensorflow-gpu python=3.8 # tensorflow if using CPU
conda activate tfasr
pip install -U tensorflow-gpu # upgrade to latest version of tensorflow
git clone https://github.com/TensorSpeech/TensorFlowASR.git
cd TensorFlowASR
pip install .

Setup training and testing

  • For datasets, see datasets

  • For training, testing and using CTC Models, run ./scripts/install_ctc_decoders.sh

  • For training Transducer Models, run export CUDA_HOME=/usr/local/cuda && ./scripts/install_rnnt_loss.sh (Note: only export CUDA_HOME when you have CUDA)

  • For mixed precision training, use flag --mxp when running python scripts from examples

  • For enabling XLA, run TF_XLA_FLAGS=--tf_xla_auto_jit=2 python3 $path_to_py_script)

  • For hiding warnings, run export TF_CPP_MIN_LOG_LEVEL=2 before running any examples

TFLite Convertion

After converting to tflite, the tflite model is like a function that transforms directly from an audio signal to unicode code points, then we can convert unicode points to string.

  1. Install tf-nightly using pip install tf-nightly
  2. Build a model with the same architecture as the trained model (if model has tflite argument, you must set it to True), then load the weights from trained model to the built model
  3. Load TFSpeechFeaturizer and TextFeaturizer to model using function add_featurizers
  4. Convert model's function to tflite as follows:
func = model.make_tflite_function(greedy=True) # or False
concrete_func = func.get_concrete_function()
converter = tf.lite.TFLiteConverter.from_concrete_functions([concrete_func])
converter.experimental_new_converter = True
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS,
                                       tf.lite.OpsSet.SELECT_TF_OPS]
tflite_model = converter.convert()
  1. Save the converted tflite model as follows:
if not os.path.exists(os.path.dirname(tflite_path)):
    os.makedirs(os.path.dirname(tflite_path))
with open(tflite_path, "wb") as tflite_out:
    tflite_out.write(tflite_model)
  1. Then the .tflite model is ready to be deployed

Features Extraction

See features_extraction

Augmentations

See augmentations

Training & Testing

Example YAML Config Structure

speech_config: ...
model_config: ...
decoder_config: ...
learning_config:
  augmentations: ...
  dataset_config:
    train_paths: ...
    eval_paths: ...
    test_paths: ...
    tfrecords_dir: ...
  optimizer_config: ...
  running_config:
    batch_size: 8
    num_epochs: 20
    outdir: ...
    log_interval_steps: 500

See examples for some predefined ASR models and results

Corpus Sources and Pretrained Models

For pretrained models, go to drive

English

Name Source Hours
LibriSpeech LibriSpeech 970h
Common Voice https://commonvoice.mozilla.org 1932h

Vietnamese

Name Source Hours
Vivos https://ailab.hcmus.edu.vn/vivos 15h
InfoRe Technology 1 InfoRe1 (passwd: BroughtToYouByInfoRe) 25h
InfoRe Technology 2 (used in VLSP2019) InfoRe2 (passwd: BroughtToYouByInfoRe) 415h

German

Name Source Hours
Common Voice https://commonvoice.mozilla.org/ 750h

References & Credits

  1. NVIDIA OpenSeq2Seq Toolkit
  2. https://github.com/noahchalifour/warp-transducer
  3. Sequence Transduction with Recurrent Neural Network
  4. End-to-End Speech Processing Toolkit in PyTorch
  5. https://github.com/iankur/ContextNet

Contact

Huy Le Nguyen

Email: nlhuy.cs.16@gmail.com

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

TensorFlowASR-0.6.2.tar.gz (50.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

TensorFlowASR-0.6.2-py3-none-any.whl (81.9 kB view details)

Uploaded Python 3

File details

Details for the file TensorFlowASR-0.6.2.tar.gz.

File metadata

  • Download URL: TensorFlowASR-0.6.2.tar.gz
  • Upload date:
  • Size: 50.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.55.0 CPython/3.8.6

File hashes

Hashes for TensorFlowASR-0.6.2.tar.gz
Algorithm Hash digest
SHA256 861366c86c131defcccbd0d8f254965b4c3387b95e480f670a5b8547afda741d
MD5 0b6b74a2e3ccad6d41761e5d1666424f
BLAKE2b-256 c5d3055c635155e6d88fc12394fadc8604552768867517303da88c748771e7d1

See more details on using hashes here.

File details

Details for the file TensorFlowASR-0.6.2-py3-none-any.whl.

File metadata

  • Download URL: TensorFlowASR-0.6.2-py3-none-any.whl
  • Upload date:
  • Size: 81.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.55.0 CPython/3.8.6

File hashes

Hashes for TensorFlowASR-0.6.2-py3-none-any.whl
Algorithm Hash digest
SHA256 1a820aff0a62edd1d203307faaab83a6a5aba15e110a22adf21ef4b2adca3af5
MD5 dda8142b87840901643aae32fc3e736e
BLAKE2b-256 8d99d73d0cac206f9a8c3439bbbd7ea890aa086d516ef73b7a3466330e43215e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page