No project description provided

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- POSIX :: Linux
Programming Language

Project description

Rhasspy ASR Kaldi

Automated speech recognition in Rhasspy voice assistant with Kaldi.

Requirements

Python 3.7
Kaldi
- Expects $KALDI_DIR in environment
Opengrm
- Expects ngram* in $PATH
Phonetisaurus
- Expects phonetisaurus-apply in $PATH

See pre-built apps for pre-compiled binaries.

Installation

$ git clone https://github.com/rhasspy/rhasspy-asr-kaldi
$ cd rhasspy-asr-kaldi
$ ./configure
$ make
$ make install

Transcribing

Use python3 -m rhasspyasr_kaldi transcribe <ARGS>

usage: rhasspy-asr-kaldi transcribe [-h] --model-dir MODEL_DIR
                                    [--graph-dir GRAPH_DIR]
                                    [--model-type MODEL_TYPE]
                                    [--frames-in-chunk FRAMES_IN_CHUNK]
                                    [wav_file [wav_file ...]]

positional arguments:
  wav_file              WAV file(s) to transcribe

optional arguments:
  -h, --help            show this help message and exit
  --model-dir MODEL_DIR
                        Path to Kaldi model directory (with conf, data)
  --graph-dir GRAPH_DIR
                        Path to Kaldi graph directory (with HCLG.fst)
  --model-type MODEL_TYPE
                        Either nnet3 or gmm (default: nnet3)
  --frames-in-chunk FRAMES_IN_CHUNK
                        Number of frames to process at a time

For nnet3 models, the online2-tcp-nnet3-decode-faster program is used to handle streaming audio. For gmm models, audio is buffered and packaged as a WAV file before being transcribed.

Training

Use python3 -m rhasspyasr_kaldi train <ARGS>

usage: rhasspy-asr-kaldi train [-h] --model-dir MODEL_DIR
                               [--graph-dir GRAPH_DIR]
                               [--intent-graph INTENT_GRAPH]
                               [--dictionary DICTIONARY]
                               [--dictionary-casing {upper,lower,ignore}]
                               [--language-model LANGUAGE_MODEL]
                               --base-dictionary BASE_DICTIONARY
                               [--g2p-model G2P_MODEL]
                               [--g2p-casing {upper,lower,ignore}]

optional arguments:
  -h, --help            show this help message and exit
  --model-dir MODEL_DIR
                        Path to Kaldi model directory (with conf, data)
  --graph-dir GRAPH_DIR
                        Path to Kaldi graph directory (with HCLG.fst)
  --intent-graph INTENT_GRAPH
                        Path to intent graph JSON file (default: stdin)
  --dictionary DICTIONARY
                        Path to write custom pronunciation dictionary
  --dictionary-casing {upper,lower,ignore}
                        Case transformation for dictionary words (training,
                        default: ignore)
  --language-model LANGUAGE_MODEL
                        Path to write custom language model
  --base-dictionary BASE_DICTIONARY
                        Paths to pronunciation dictionaries
  --g2p-model G2P_MODEL
                        Path to Phonetisaurus grapheme-to-phoneme FST model
  --g2p-casing {upper,lower,ignore}
                        Case transformation for g2p words (training, default:
                        ignore)

This will generate a custom HCLG.fst from an intent graph created using rhasspy-nlu. Your Kaldi model directory should be laid out like this:

my_model/ (--model-dir)
- conf/
  - mfcc_hires.conf
- data/
  - local/
    - dict/
      - lexicon.txt (copied from --dictionary)
    - lang/
      - lm.arpa.gz (copied from --language-model)
- graph/ (--graph-dir)
  - HCLG.fst (generated)
- model/
  - final.mdl
- phones/
  - extra_questions.txt
  - nonsilence_phones.txt
  - optional_silence.txt
  - silence_phones.txt
- online/ (nnet3 only)
- extractor/ (nnet3 only)

When using the train command, you will need to specify the following arguments:

--intent-graph - path to graph json file generated using rhasspy-nlu
--model-type - either nnet3 or gmm
--model-dir - path to top-level model directory (my_model in example above)
--graph-dir - path to directory where HCLG.fst should be written (my_model/graph in example above)
--base-dictionary - pronunciation dictionary with all words from intent graph (can be used multiple times)
--dictionary - path to write custom pronunciation dictionary (optional)
--language-model - path to write custom ARPA language model (optional)

Building From Source

rhasspy-asr-kaldi depends on the following programs that must be compiled:

Kaldi
- Speech to text engine
Opengrm
- Create ARPA language models
Phonetisaurus
- Guesses pronunciations for unknown words

Kaldi

Make sure you have the necessary dependencies installed:

sudo apt-get install \
    build-essential \
    libatlas-base-dev libatlas3-base gfortran \
    automake autoconf unzip sox libtool subversion \
    python3 python \
    git zlib1g-dev

Download Kaldi and extract it:

wget -O kaldi-master.tar.gz \
    'https://github.com/kaldi-asr/kaldi/archive/master.tar.gz'
tar -xvf kaldi-master.tar.gz

First, build Kaldi's tools:

cd kaldi-master/tools
make

Use make -j 4 if you have multiple CPU cores. This will take a long time.

Next, build Kaldi itself:

cd kaldi-master
./configure --shared --mathlib=ATLAS
make depend
make

Use make depend -j 4 and make -j 4 if you have multiple CPU cores. This will take a long time.

There is no installation step. The kaldi-master directory contains all the libraries and programs that Rhasspy will need to access.

See docker-kaldi for a Docker build script.

Phonetisaurus

Make sure you have the necessary dependencies installed:

sudo apt-get install build-essential

First, download and build OpenFST 1.6.2

wget http://www.openfst.org/twiki/pub/FST/FstDownload/openfst-1.6.2.tar.gz
tar -xvf openfst-1.6.2.tar.gz
cd openfst-1.6.2
./configure \
    "--prefix=$(pwd)/build" \
    --enable-static --enable-shared \
    --enable-far --enable-ngram-fsts
make
make install

Use make -j 4 if you have multiple CPU cores. This will take a long time.

Next, download and extract Phonetisaurus:

wget -O phonetisaurus-master.tar.gz \
    'https://github.com/AdolfVonKleist/Phonetisaurus/archive/master.tar.gz'
tar -xvf phonetisaurus-master.tar.gz

Finally, build Phonetisaurus (where /path/to/openfst is the openfst-1.6.2 directory from above):

cd Phonetisaurus-master
./configure \
    --with-openfst-includes=/path/to/openfst/build/include \
    --with-openfst-libs=/path/to/openfst/build/lib
make
make install

Use make -j 4 if you have multiple CPU cores. This will take a long time.

You should now be able to run the phonetisaurus-align program.

See docker-phonetisaurus for a Docker build script.

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- POSIX :: Linux
Programming Language

Release history Release notifications | RSS feed

This version

0.6.1

May 25, 2021

0.6.0

Apr 1, 2021

0.5.0

Oct 16, 2020

0.4.1

Oct 10, 2020

0.3.0

Jul 17, 2020

0.2.0

Jun 24, 2020

0.1.6

Jun 3, 2020

0.1.5

May 26, 2020

0.1.4

Apr 24, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rhasspy-asr-kaldi-0.6.1.tar.gz (13.5 kB view details)

Uploaded May 25, 2021 Source

File details

Details for the file rhasspy-asr-kaldi-0.6.1.tar.gz.

File metadata

Download URL: rhasspy-asr-kaldi-0.6.1.tar.gz
Upload date: May 25, 2021
Size: 13.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.10

File hashes

Hashes for rhasspy-asr-kaldi-0.6.1.tar.gz
Algorithm	Hash digest
SHA256	`1c9eeae8f96d05da0093029dd093a8c1253fc5fd50e33b29da5a26cf5cd72ccb`
MD5	`e53cc5bbb01806dd9f62929b7d24d93b`
BLAKE2b-256	`bc9998ae0ff8b3127981da6ca6ac7bdcebd75141a3997708ed40547b54ccd870`

See more details on using hashes here.

rhasspy-asr-kaldi 0.6.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Rhasspy ASR Kaldi

Requirements

Installation

Transcribing

Training

Building From Source

Kaldi

Phonetisaurus

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes