Skip to main content

ExKaldi Automatic Speech Recognition Toolkit

Project description

ExKaldi: A Python-based Extension Tool of Kaldi

exkaldi_ubuntu_build

ExKaldi automatic speech recognition toolkit is developed to build an interface between Kaldi ASR toolkit and Python. Differing from other Kaldi wrappers, ExKaldi have these features:

  1. Integrated APIs to build a ASR systems, including feature extraction, GMM-HMM acoustic model training, N-Grams language model training, decoding and scoring.
  2. ExKaldi provides tools to support train DNN acoustic model with Deep Learning frameworks, such as Tensorflow.
  3. ExKaldi supports CTC decoding.

The goal of ExKaldi is to help developers build high-performance ASR systems with Python language easily.

Installation

Current version: 1.3.5. (We only tested our toolkit on Ubuntu >= 16., python3.6,python3.7,python3.8 with gh-action)

  1. If you have not installed Kaldi ASR toolkit, clone the Kaldi ASR toolkit repository firstly (Kaldi version 5.5 is expected.)
git clone https://github.com/kaldi-asr/kaldi.git kaldi --origin upstream

And follow these three tutorial files to install and compile it.

less kaldi/INSTALL
less kaldi/tools/INSTALL
less kaldi/src/INSTALL
  1. Clone the ExKaldi source code from our github project, then install it.

Install with pip

$ pip install https://github.com/kpu/kenlm/archive/master.zip
$ pip install exkaldi

Install with Source

$ git clone https://github.com/wangyu09/exkaldi.git
$ cd exkaldi
$ bash quick_install.sh
  1. Check if it is installed correctly.
python3 -c "import exkaldi"

Tutorial

In exkaldi/tutorials directory, we prepared a simple tutorial to show how to use ExKaldi APIs to build a ASR system from the scratch. The data is from librispeech train_100_clean dataset. This tutorial includes:

  1. Extract and process MFCC feature.
  2. Train and querying a N-grams language model.
  3. Train monophone GMM-HMM, build decision tree, and train triphone GMM-HMM.
  4. Train a DNN acoustic model with Tensorflow.
  5. Compile WFST decoding graph.
  6. Decode based on GMM-HMM and DNN-HMM.
  7. Process lattice and compute WER score.

This ASR symtem built here is just a dummy model, and we have done some formal experiments in exkaldi/examples. Check the source code or documents to look more information about APIs.

Experiments

We have done some experiments to test ExKaldi toolkit, and they achieved a good performance.

TIMIT

1, The perplexity of various language models. All these systems are trained with TIMIT train dataset and tested with TIMIT test data. The score showed in the table is PPL score.

2-grams 3-grams 4-grams 5-grams 6-grams
Kaldi baseline irstlm 14.41 --- --- --- ---
ExKaldi srilm 14.42 13.05 13.67 14.30 14.53
ExKaldi kenlm 14.39 12.75 12.75 12.70 12.25

2, The phone error rate (PER) of various GMM-HMM-based systems. All these systems are trained with TIMIT train dataset and tested with TIMIT test dataset. The Language model backend used in ExKaldi is KenLM. From the results, we can know than KenLm is avaliable to optimize the language model. And what's more, with ExKaldi, we cherry-picked the N-grams model by testing the perplexity and it improved the performance of ASR system.

mono tri1 tri2 tri3
Kaldi baseline 2grams 32.54 26.17 23.63 21.54
ExKaldi 2grams 32.53 25.89 23.63 21.43
ExKaldi 6grams 29.83 24.07 22.40 20.01

3, The phone error rate (PER) of two DNN-HMM-based systems. We trained our models with Tensorflow 2.3. The version of PyTorch-Kaldi toolkit is 1.0 in our experiment.

DNN LSTM
Kaldi baseline 18.67 ---
PyTorch-Kaldi 17.99 17.01
ExKaldi 15.13 15.01

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

exkaldi-1.3.5.4.tar.gz (1.2 MB view details)

Uploaded Source

Built Distribution

exkaldi-1.3.5.4-py3-none-any.whl (1.2 MB view details)

Uploaded Python 3

File details

Details for the file exkaldi-1.3.5.4.tar.gz.

File metadata

  • Download URL: exkaldi-1.3.5.4.tar.gz
  • Upload date:
  • Size: 1.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/53.0.0 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.9.2

File hashes

Hashes for exkaldi-1.3.5.4.tar.gz
Algorithm Hash digest
SHA256 bdea218b3b091693344133660437471591b548f07a2e277bca3138c68581fd9b
MD5 edd465cdb76e92b68cc11416f1d4599d
BLAKE2b-256 808b2dd93c93a4bb60738a7a478abd52d00a56b9b86ea537fc0016fd64aa5aea

See more details on using hashes here.

File details

Details for the file exkaldi-1.3.5.4-py3-none-any.whl.

File metadata

  • Download URL: exkaldi-1.3.5.4-py3-none-any.whl
  • Upload date:
  • Size: 1.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/53.0.0 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.9.2

File hashes

Hashes for exkaldi-1.3.5.4-py3-none-any.whl
Algorithm Hash digest
SHA256 88e37f1bdcbd2f8fa6b036916f94c1b1ab40b5ad3c162758a84773030ee92089
MD5 fad85b86164d61ed1a9e1335b29c88b6
BLAKE2b-256 6c1b5520702f8f1db939bb62cc3eb69cd74053df1a61cf0aa56627763722b280

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page