Almost State-of-the-art Automatic Speech Recognition using Tensorflow 2
Project description
TensorFlowASR :zap:
Almost State-of-the-art Automatic Speech Recognition in Tensorflow 2
TensorFlowASR implements some automatic speech recognition architectures such as DeepSpeech2, Conformer, etc. These models can be converted to TFLite to reduce memory and computation for deployment :smile:
What's New?
- (10/18/2020) Supported Streaming Transducer https://arxiv.org/abs/1811.06621
- (10/15/2020) Add gradients accumulation and Refactor to TensorflowASR
- (10/10/2020) Update documents and upload package to pypi
- (10/6/2020) Change
nlpaugversion to>=1.0.1 - (9/18/2020) Support
word-pieces(akasubwords) usingtensorflow-datasets - Support
transducertflite greedy decoding (conversion and invocation) - Distributed training using
tf.distribute.MirroredStrategy
:yum: Supported Models
- CTCModel (End2end models using CTC Loss for training)
- Transducer Models (End2end models using RNNT Loss for training)
- Conformer Transducer (Reference: https://arxiv.org/abs/2005.08100) See examples/conformer
- Streaming Transducer (Reference: https://arxiv.org/abs/1811.06621) See examples/streaming_transducer
Setup Environment and Datasets
Install tensorflow: pip3 install -U tensorflow or pip3 install tf-nightly (for using tflite)
Install packages (choose one of these options):
- Run
pip3 install -U TensorFlowASR - Clone the repo and run
python3 setup.py installin the repo's directory
For setting up datasets, see datasets
-
For training, testing and using CTC Models, run
./scripts/install_ctc_decoders.sh -
For training Transducer Models, export
CUDA_HOMEand run./scripts/install_rnnt_loss.sh -
Method
tensorflow_asr.utils.setup_environment()enable mixed_precision if available. -
To enable XLA, run
TF_XLA_FLAGS=--tf_xla_auto_jit=2 $python_train_script
Clean up: python3 setup.py clean --all (this will remove /build contents)
TFLite Convertion
After converting to tflite, the tflite model is like a function that transforms directly from an audio signal to unicode code points, then we can convert unicode points to string.
- Install
tf-nightlyusingpip install tf-nightly - Build a model with the same architecture as the trained model (if model has tflite argument, you must set it to True), then load the weights from trained model to the built model
- Load
TFSpeechFeaturizerandTextFeaturizerto model using functionadd_featurizers - Convert model's function to tflite as follows:
func = model.make_tflite_function(greedy=True) # or False
concrete_func = func.get_concrete_function()
converter = tf.lite.TFLiteConverter.from_concrete_functions([concrete_func])
converter.experimental_new_converter = True
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS,
tf.lite.OpsSet.SELECT_TF_OPS]
tflite_model = converter.convert()
- Save the converted tflite model as follows:
if not os.path.exists(os.path.dirname(tflite_path)):
os.makedirs(os.path.dirname(tflite_path))
with open(tflite_path, "wb") as tflite_out:
tflite_out.write(tflite_model)
- Then the
.tflitemodel is ready to be deployed
Features Extraction
Augmentations
See augmentations
Training & Testing
Example YAML Config Structure
speech_config: ...
model_config: ...
decoder_config: ...
learning_config:
augmentations: ...
dataset_config:
train_paths: ...
eval_paths: ...
test_paths: ...
tfrecords_dir: ...
optimizer_config: ...
running_config:
batch_size: 8
num_epochs: 20
outdir: ...
log_interval_steps: 500
See examples for some predefined ASR models and results
Corpus Sources and Pretrained Models
For pretrained models, go to drive
English
| Name | Source | Hours |
|---|---|---|
| LibriSpeech | LibriSpeech | 970h |
| Common Voice | https://commonvoice.mozilla.org | 1932h |
Vietnamese
| Name | Source | Hours |
|---|---|---|
| Vivos | https://ailab.hcmus.edu.vn/vivos | 15h |
| InfoRe Technology 1 | InfoRe1 (passwd: BroughtToYouByInfoRe) | 25h |
| InfoRe Technology 2 (used in VLSP2019) | InfoRe2 (passwd: BroughtToYouByInfoRe) | 415h |
German
| Name | Source | Hours |
|---|---|---|
| Common Voice | https://commonvoice.mozilla.org/ | 750h |
References & Credits
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file TensorFlowASR-0.2.7.tar.gz.
File metadata
- Download URL: TensorFlowASR-0.2.7.tar.gz
- Upload date:
- Size: 41.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.50.0 CPython/3.7.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b276278b9d6d716c9a201c9171a21c048e79ca24deec593c35951365c4ddbd13
|
|
| MD5 |
7118201f229c466da729c5e58f88df4c
|
|
| BLAKE2b-256 |
4e823077d465699630a935370a7dbf3048deafb16de158c711412431b6da8065
|
File details
Details for the file TensorFlowASR-0.2.7-py3.7.egg.
File metadata
- Download URL: TensorFlowASR-0.2.7-py3.7.egg
- Upload date:
- Size: 155.8 kB
- Tags: Egg
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.50.0 CPython/3.7.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0d37bc8a859417a43418ce885fc081974f57743147e9e88d298b7437f706e5bc
|
|
| MD5 |
80eff013a5f9b39897b8242c0ebd6f77
|
|
| BLAKE2b-256 |
9760169456055f60ff274ca3c3285191cec7ec256aff5527b602f8adcc3080d5
|