Skip to main content

Almost State-of-the-art Automatic Speech Recognition using Tensorflow 2

Project description

TensorFlowASR :zap:

GitHub python tensorflow PyPI

Almost State-of-the-art Automatic Speech Recognition in Tensorflow 2

TensorFlowASR implements some automatic speech recognition architectures such as DeepSpeech2, Jasper, RNN Transducer, ContextNet, Conformer, etc. These models can be converted to TFLite to reduce memory and computation for deployment :smile:

What's New?

Table of Contents

:yum: Supported Models

Baselines

  • Transducer Models (End2end models using RNNT Loss for training, currently supported Conformer, ContextNet, Streaming Transducer)
  • CTCModel (End2end models using CTC Loss for training, currently supported DeepSpeech2, Jasper)

Publications

Installation

For training and testing, you should use git clone for installing necessary packages from other authors (ctc_decoders, rnnt_loss, etc.)

Installing from source (recommended)

git clone https://github.com/TensorSpeech/TensorFlowASR.git
cd TensorFlowASR
# Tensorflow 2.x (with 2.x.x >= 2.5.1)
pip3 install ".[tf2.x]" # or ".[tf2.x-gpu]"

For anaconda3:

conda create -y -n tfasr tensorflow-gpu python=3.8 # tensorflow if using CPU, this makes sure conda install all dependencies for tensorflow
conda activate tfasr
pip install -U tensorflow-gpu # upgrade to latest version of tensorflow
git clone https://github.com/TensorSpeech/TensorFlowASR.git
cd TensorFlowASR
# Tensorflow 2.x (with 2.x.x >= 2.5.1)
pip3 install ".[tf2.x]" # or ".[tf2.x-gpu]"

Installing via PyPi

# Tensorflow 2.x (with 2.x >= 2.3)
pip3 install "TensorFlowASR[tf2.x]" # or pip3 install "TensorFlowASR[tf2.x-gpu]"

Installing for development

git clone https://github.com/TensorSpeech/TensorFlowASR.git
cd TensorFlowASR
pip3 install -e ".[dev]"
pip3 install -e ".[tf2.x]" # or ".[tf2.x-gpu]" or ".[tf2.x-apple]" for apple m1 machine

Install for Apple Sillicon

Due to tensorflow-text is not built for Apple Sillicon, we need to install it with the prebuilt wheel file from sun1638650145/Libraries-and-Extensions-for-TensorFlow-for-Apple-Silicon

git clone https://github.com/TensorSpeech/TensorFlowASR.git
cd TensorFlowASR
pip3 install -e "." # or pip3 install -e ".[dev] for development # or pip3 install "TensorFlowASR[dev]" from PyPi
pip3 install tensorflow~=2.14.0 # change minor version if you want

Do this after installing TensorFlowASR with tensorflow above

TF_VERSION="$(python3 -c 'import tensorflow; print(tensorflow.__version__)')" && \
TF_VERSION_MAJOR="$(echo $TF_VERSION | cut -d'.' -f1,2)" && \
PY_VERSION="$(python3 -c 'import platform; major, minor, patch = platform.python_version_tuple(); print(f"{major}{minor}");')" && \
URL="https://github.com/sun1638650145/Libraries-and-Extensions-for-TensorFlow-for-Apple-Silicon" && \
pip3 install "${URL}/releases/download/v${TF_VERSION_MAJOR}/tensorflow_text-${TF_VERSION_MAJOR}.0-cp${PY_VERSION}-cp${PY_VERSION}-macosx_11_0_arm64.whl"

Running in a container

docker-compose up -d

Training & Testing Tutorial

FYI: Keras builtin training uses infinite dataset, which avoids the potential last partial batch.

See examples for some predefined ASR models and results

Features Extraction

See features_extraction

Augmentations

See augmentations

TFLite Convertion

After converting to tflite, the tflite model is like a function that transforms directly from an audio signal to text and tokens

See tflite_convertion

Pretrained Models

Go to drive

Corpus Sources

English

Name Source Hours
LibriSpeech LibriSpeech 970h
Common Voice https://commonvoice.mozilla.org 1932h

Vietnamese

Name Source Hours
Vivos https://ailab.hcmus.edu.vn/vivos 15h
InfoRe Technology 1 InfoRe1 (passwd: BroughtToYouByInfoRe) 25h
InfoRe Technology 2 (used in VLSP2019) InfoRe2 (passwd: BroughtToYouByInfoRe) 415h

How to contribute

  1. Fork the project
  2. Install for development
  3. Create a branch
  4. Make a pull request to this repo

References & Credits

  1. NVIDIA OpenSeq2Seq Toolkit
  2. https://github.com/noahchalifour/warp-transducer
  3. Sequence Transduction with Recurrent Neural Network
  4. End-to-End Speech Processing Toolkit in PyTorch
  5. https://github.com/iankur/ContextNet

Contact

Huy Le Nguyen

Email: nlhuy.cs.16@gmail.com

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

TensorFlowASR-2.1.0.tar.gz (88.1 kB view details)

Uploaded Source

Built Distribution

TensorFlowASR-2.1.0-py3-none-any.whl (134.1 kB view details)

Uploaded Python 3

File details

Details for the file TensorFlowASR-2.1.0.tar.gz.

File metadata

  • Download URL: TensorFlowASR-2.1.0.tar.gz
  • Upload date:
  • Size: 88.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.10.14

File hashes

Hashes for TensorFlowASR-2.1.0.tar.gz
Algorithm Hash digest
SHA256 447a538bdbc40e2927f0009c1073f52834d07b33a5b67e4a8d09c5044b0bcff9
MD5 03cf96db8838bee0e87792c567e26db3
BLAKE2b-256 d13540ce897804cfb0549c1b836f0a9d11f0604da3d9d7341e295cb9dc2c58e5

See more details on using hashes here.

File details

Details for the file TensorFlowASR-2.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for TensorFlowASR-2.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d78e4e570b6bba5b72debbd394dbeae5b9bc8375895092209f7fdc09960f1ba9
MD5 df90e4342a7bf25be9fab410fb8884c7
BLAKE2b-256 986ca270b1bdaa0b03f0df3f072dbe7b709b54bf9833659772573f5b0317dc88

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page