Skip to main content

Almost State-of-the-art Automatic Speech Recognition using Tensorflow 2

Project description

TensorFlowASR :zap:

GitHub python tensorflow ubuntu

Almost State-of-the-art Automatic Speech Recognition in Tensorflow 2

TensorFlowASR implements some automatic speech recognition architectures such as DeepSpeech2, Conformer, etc. These models can be converted to TFLite to reduce memory and computation for deployment :smile:

What's New?

  • (10/18/2020) Supported Streaming Transducer https://arxiv.org/abs/1811.06621
  • (10/15/2020) Add gradients accumulation and Refactor to TensorflowASR
  • (10/10/2020) Update documents and upload package to pypi
  • (10/6/2020) Change nlpaug version to >=1.0.1
  • (9/18/2020) Support word-pieces (aka subwords) using tensorflow-datasets
  • Support transducer tflite greedy decoding (conversion and invocation)
  • Distributed training using tf.distribute.MirroredStrategy

:yum: Supported Models

Setup Environment and Datasets

Install tensorflow: pip3 install -U tensorflow or pip3 install tf-nightly (for using tflite)

Install packages (choose one of these options):

  • Run pip3 install -U TensorFlowASR
  • Clone the repo and run python3 setup.py install in the repo's directory

For setting up datasets, see datasets

  • For training, testing and using CTC Models, run ./scripts/install_ctc_decoders.sh

  • For training Transducer Models, export CUDA_HOME and run ./scripts/install_rnnt_loss.sh

  • Method tensorflow_asr.utils.setup_environment() enable mixed_precision if available.

  • To enable XLA, run TF_XLA_FLAGS=--tf_xla_auto_jit=2 $python_train_script

Clean up: python3 setup.py clean --all (this will remove /build contents)

TFLite Convertion

After converting to tflite, the tflite model is like a function that transforms directly from an audio signal to unicode code points, then we can convert unicode points to string.

  1. Install tf-nightly using pip install tf-nightly
  2. Build a model with the same architecture as the trained model (if model has tflite argument, you must set it to True), then load the weights from trained model to the built model
  3. Load TFSpeechFeaturizer and TextFeaturizer to model using function add_featurizers
  4. Convert model's function to tflite as follows:
func = model.make_tflite_function(greedy=True) # or False
concrete_func = func.get_concrete_function()
converter = tf.lite.TFLiteConverter.from_concrete_functions([concrete_func])
converter.experimental_new_converter = True
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS,
                                       tf.lite.OpsSet.SELECT_TF_OPS]
tflite_model = converter.convert()
  1. Save the converted tflite model as follows:
if not os.path.exists(os.path.dirname(tflite_path)):
    os.makedirs(os.path.dirname(tflite_path))
with open(tflite_path, "wb") as tflite_out:
    tflite_out.write(tflite_model)
  1. Then the .tflite model is ready to be deployed

Features Extraction

See features_extraction

Augmentations

See augmentations

Training & Testing

Example YAML Config Structure

speech_config: ...
model_config: ...
decoder_config: ...
learning_config:
  augmentations: ...
  dataset_config:
    train_paths: ...
    eval_paths: ...
    test_paths: ...
    tfrecords_dir: ...
  optimizer_config: ...
  running_config:
    batch_size: 8
    num_epochs: 20
    outdir: ...
    log_interval_steps: 500

See examples for some predefined ASR models and results

Corpus Sources and Pretrained Models

For pretrained models, go to drive

English

Name Source Hours
LibriSpeech LibriSpeech 970h
Common Voice https://commonvoice.mozilla.org 1932h

Vietnamese

Name Source Hours
Vivos https://ailab.hcmus.edu.vn/vivos 15h
InfoRe Technology 1 InfoRe1 (passwd: BroughtToYouByInfoRe) 25h
InfoRe Technology 2 (used in VLSP2019) InfoRe2 (passwd: BroughtToYouByInfoRe) 415h

German

Name Source Hours
Common Voice https://commonvoice.mozilla.org/ 750h

References & Credits

  1. NVIDIA OpenSeq2Seq Toolkit
  2. https://github.com/noahchalifour/warp-transducer
  3. Sequence Transduction with Recurrent Neural Network
  4. End-to-End Speech Processing Toolkit in PyTorch

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

TensorFlowASR-0.2.7.tar.gz (41.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

TensorFlowASR-0.2.7-py3.7.egg (155.8 kB view details)

Uploaded Egg

File details

Details for the file TensorFlowASR-0.2.7.tar.gz.

File metadata

  • Download URL: TensorFlowASR-0.2.7.tar.gz
  • Upload date:
  • Size: 41.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.50.0 CPython/3.7.9

File hashes

Hashes for TensorFlowASR-0.2.7.tar.gz
Algorithm Hash digest
SHA256 b276278b9d6d716c9a201c9171a21c048e79ca24deec593c35951365c4ddbd13
MD5 7118201f229c466da729c5e58f88df4c
BLAKE2b-256 4e823077d465699630a935370a7dbf3048deafb16de158c711412431b6da8065

See more details on using hashes here.

File details

Details for the file TensorFlowASR-0.2.7-py3.7.egg.

File metadata

  • Download URL: TensorFlowASR-0.2.7-py3.7.egg
  • Upload date:
  • Size: 155.8 kB
  • Tags: Egg
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.50.0 CPython/3.7.9

File hashes

Hashes for TensorFlowASR-0.2.7-py3.7.egg
Algorithm Hash digest
SHA256 0d37bc8a859417a43418ce885fc081974f57743147e9e88d298b7437f706e5bc
MD5 80eff013a5f9b39897b8242c0ebd6f77
BLAKE2b-256 9760169456055f60ff274ca3c3285191cec7ec256aff5527b602f8adcc3080d5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page