Skip to main content

Adapt Kaldi-ASR nnet3 chain models from Zamia-Speech.org to a different language model.

Project description

# kaldi-adapt-lm

Adapt Kaldi-ASR nnet3 chain models from Zamia-Speech.org to a different language model.

Constructive comments, patches and pull-requests are very welcome.

Tutorial

To create the language model we would like to adapt our kaldi model to, we first need to create a set of sentences. To get started, download and uncompress a generic set of sentences for you language, e.g.

wget ‘http://goofy.zamia.org/zamia-speech/misc/sentences-en.txt.xz’ unxz sentences-en.txt.xz

now suppose the file utts.txt contained the sentences you would like the model to recognize with a higher probability than the rest. To achieve that, we add these sentences five times in this examples to our text body:

cat utts.txt utts.txt utts.txt utts.txt utts.txt sentences-en.txt >lm.txt

we also want to limit our language model to the vocabulary the audio model supports, so let’s extract the vocabulary next:

MODEL=”models/kaldi-generic-en-tdnn_sp-latest” cut -f 1 -d ‘ ‘ ${MODEL}/data/local/dict/lexicon.txt >vocab.txt

with those files in place we can now train our new language model using KenLM:

lmplz -o 4 –prune 0 1 2 3 –limit_vocab_file vocab.txt –interpolate_unigrams 0 <lm.txt >lm.arpa

Now we can start the kaldi model adaptation process:

kaldi-adapt-lm ${MODEL} lm.arpa mymodel

You should now be able to find a tarball of the resulting model inside the work subdirectory.

Requirements

  • Python 2

  • Kaldi ASR

License

My own code is Apache-2.0 licensed unless otherwise noted in the script’s copyright headers.

Author

Guenter Bartsch <<guenter@zamia.org>>

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kaldi-adapt-lm-0.1.3.tar.gz (4.1 kB view details)

Uploaded Source

Built Distribution

kaldi_adapt_lm-0.1.3-py2.py3-none-any.whl (10.2 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file kaldi-adapt-lm-0.1.3.tar.gz.

File metadata

  • Download URL: kaldi-adapt-lm-0.1.3.tar.gz
  • Upload date:
  • Size: 4.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.6.0 setuptools/2.0 requests-toolbelt/0.8.0 tqdm/4.25.0 CPython/2.7.5

File hashes

Hashes for kaldi-adapt-lm-0.1.3.tar.gz
Algorithm Hash digest
SHA256 81a094adee4d27e22e7a6877d4cdd0c9948a70c69fac2822e7a7fb5f22cb2bc1
MD5 68efdfb55437fae31bea4e1bd77fd165
BLAKE2b-256 4110ea00fb6ed3ef886b6f3e31dc1e8fb9adda5fce4eb1f33fc23288ca8d1aff

See more details on using hashes here.

File details

Details for the file kaldi_adapt_lm-0.1.3-py2.py3-none-any.whl.

File metadata

  • Download URL: kaldi_adapt_lm-0.1.3-py2.py3-none-any.whl
  • Upload date:
  • Size: 10.2 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.6.0 setuptools/2.0 requests-toolbelt/0.8.0 tqdm/4.25.0 CPython/2.7.5

File hashes

Hashes for kaldi_adapt_lm-0.1.3-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 108fb4ee444188095fb74569daa01e95b4de66af47405d0670b6e23b2b95a74a
MD5 498ea12488241df69f207ca0d1b0615c
BLAKE2b-256 1e45b253ce5673905b0ff1772b16a3f3160bf39fb94e947fa08d83bf611e7650

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page