Almost State-of-the-art Automatic Speech Recognition using Tensorflow 2

These details have not been verified by PyPI

Project links

Homepage

Project description

TensorFlowASR :zap:

Almost State-of-the-art Automatic Speech Recognition in Tensorflow 2

TensorFlowASR implements some automatic speech recognition architectures such as DeepSpeech2, Conformer, etc. These models can be converted to TFLite to reduce memory and computation for deployment :smile:

What's New?

(10/18/2020) Supported Streaming Transducer https://arxiv.org/abs/1811.06621
(10/15/2020) Add gradients accumulation and Refactor to TensorflowASR
(10/10/2020) Update documents and upload package to pypi
(10/6/2020) Change nlpaug version to >=1.0.1
(9/18/2020) Support word-pieces (aka subwords) using tensorflow-datasets
Support transducer tflite greedy decoding (conversion and invocation)
Distributed training using tf.distribute.MirroredStrategy

:yum: Supported Models

CTCModel (End2end models using CTC Loss for training)
Transducer Models (End2end models using RNNT Loss for training)
Conformer Transducer (Reference: https://arxiv.org/abs/2005.08100) See examples/conformer
Streaming Transducer (Reference: https://arxiv.org/abs/1811.06621) See examples/streaming_transducer

Setup Environment and Datasets

Install tensorflow: pip3 install -U tensorflow or pip3 install tf-nightly (for using tflite)

Install packages (choose one of these options):

Run pip3 install -U TensorFlowASR
Clone the repo and run python3 setup.py install in the repo's directory

For setting up datasets, see datasets

For training, testing and using CTC Models, run ./scripts/install_ctc_decoders.sh
For training Transducer Models, export CUDA_HOME and run ./scripts/install_rnnt_loss.sh
Method tensorflow_asr.utils.setup_environment() enable mixed_precision if available.
To enable XLA, run TF_XLA_FLAGS=--tf_xla_auto_jit=2 $python_train_script

Clean up: python3 setup.py clean --all (this will remove /build contents)

TFLite Convertion

After converting to tflite, the tflite model is like a function that transforms directly from an audio signal to unicode code points, then we can convert unicode points to string.

Install tf-nightly using pip install tf-nightly
Build a model with the same architecture as the trained model (if model has tflite argument, you must set it to True), then load the weights from trained model to the built model
Load TFSpeechFeaturizer and TextFeaturizer to model using function add_featurizers
Convert model's function to tflite as follows:

func = model.make_tflite_function(greedy=True) # or False
concrete_func = func.get_concrete_function()
converter = tf.lite.TFLiteConverter.from_concrete_functions([concrete_func])
converter.experimental_new_converter = True
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS,
                                       tf.lite.OpsSet.SELECT_TF_OPS]
tflite_model = converter.convert()

Save the converted tflite model as follows:

if not os.path.exists(os.path.dirname(tflite_path)):
    os.makedirs(os.path.dirname(tflite_path))
with open(tflite_path, "wb") as tflite_out:
    tflite_out.write(tflite_model)

Then the .tflite model is ready to be deployed

Features Extraction

See features_extraction

Augmentations

See augmentations

Training & Testing

Example YAML Config Structure

speech_config: ...
model_config: ...
decoder_config: ...
learning_config:
  augmentations: ...
  dataset_config:
    train_paths: ...
    eval_paths: ...
    test_paths: ...
    tfrecords_dir: ...
  optimizer_config: ...
  running_config:
    batch_size: 8
    num_epochs: 20
    outdir: ...
    log_interval_steps: 500

See examples for some predefined ASR models and results

Corpus Sources and Pretrained Models

For pretrained models, go to drive

English

Name	Source	Hours
LibriSpeech	LibriSpeech	970h
Common Voice	https://commonvoice.mozilla.org	1932h

Vietnamese

Name	Source	Hours
Vivos	https://ailab.hcmus.edu.vn/vivos	15h
InfoRe Technology 1	InfoRe1 (passwd: BroughtToYouByInfoRe)	25h
InfoRe Technology 2 (used in VLSP2019)	InfoRe2 (passwd: BroughtToYouByInfoRe)	415h

German

Name	Source	Hours
Common Voice	https://commonvoice.mozilla.org/	750h

References & Credits

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

2.1.0

Jun 9, 2024

2.0.1

May 19, 2024

2.0.0

May 4, 2024

1.0.3

Mar 12, 2022

1.0.2

Nov 7, 2021

1.0.1

May 17, 2021

1.0.0

Apr 17, 2021

0.8.3

Apr 10, 2021

0.8.2

Apr 6, 2021

0.8.1

Mar 17, 2021

0.8.0

Mar 9, 2021

0.7.8

Feb 24, 2021

0.7.7

Feb 21, 2021

0.7.6

Feb 19, 2021

0.7.5

Feb 16, 2021

0.7.4

Feb 13, 2021

0.7.3

Feb 12, 2021

0.7.2

Feb 7, 2021

0.7.1

Jan 31, 2021

0.7.0

Jan 23, 2021

0.6.4

Jan 15, 2021

0.6.3

Jan 6, 2021

0.6.2

Dec 27, 2020

0.6.1

Dec 27, 2020

0.6.0

Dec 27, 2020

0.5.5

Dec 25, 2020

0.5.4

Dec 24, 2020

0.5.3

Dec 20, 2020

0.5.2

Dec 19, 2020

0.5.1

Dec 19, 2020

0.5.0

Dec 17, 2020

0.4.5

Dec 15, 2020

0.4.3

Dec 12, 2020

0.4.2

Dec 11, 2020

0.4.1

Dec 9, 2020

0.4.0

Dec 6, 2020

0.3.2

Dec 4, 2020

0.3.1

Nov 22, 2020

0.3.0

Nov 14, 2020

0.2.10

Nov 3, 2020

0.2.9

Oct 31, 2020

0.2.8

Oct 20, 2020

This version

0.2.7

Oct 18, 2020

0.2.6

Oct 16, 2020

0.2.5

Oct 15, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

TensorFlowASR-0.2.7.tar.gz (41.9 kB view details)

Uploaded Oct 18, 2020 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

TensorFlowASR-0.2.7-py3.7.egg (155.8 kB view details)

Uploaded Oct 18, 2020 Egg

File details

Details for the file TensorFlowASR-0.2.7.tar.gz.

File metadata

Download URL: TensorFlowASR-0.2.7.tar.gz
Upload date: Oct 18, 2020
Size: 41.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.50.0 CPython/3.7.9

File hashes

Hashes for TensorFlowASR-0.2.7.tar.gz
Algorithm	Hash digest
SHA256	`b276278b9d6d716c9a201c9171a21c048e79ca24deec593c35951365c4ddbd13`
MD5	`7118201f229c466da729c5e58f88df4c`
BLAKE2b-256	`4e823077d465699630a935370a7dbf3048deafb16de158c711412431b6da8065`

See more details on using hashes here.

File details

Details for the file TensorFlowASR-0.2.7-py3.7.egg.

File metadata

Download URL: TensorFlowASR-0.2.7-py3.7.egg
Upload date: Oct 18, 2020
Size: 155.8 kB
Tags: Egg
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.50.0 CPython/3.7.9

File hashes

Hashes for TensorFlowASR-0.2.7-py3.7.egg
Algorithm	Hash digest
SHA256	`0d37bc8a859417a43418ce885fc081974f57743147e9e88d298b7437f706e5bc`
MD5	`80eff013a5f9b39897b8242c0ebd6f77`
BLAKE2b-256	`9760169456055f60ff274ca3c3285191cec7ec256aff5527b602f8adcc3080d5`

See more details on using hashes here.

TensorFlowASR 0.2.7

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

TensorFlowASR :zap:

Almost State-of-the-art Automatic Speech Recognition in Tensorflow 2

What's New?

:yum: Supported Models

Setup Environment and Datasets

TFLite Convertion

Features Extraction

Augmentations

Training & Testing

Corpus Sources and Pretrained Models

English

Vietnamese

German

References & Credits

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes