Kaldi speech recognition with grammars that can be set active/inactive dynamically at decode-time
Kaldi Active Grammar
Python Kaldi speech recognition with grammars that can be set active/inactive dynamically at decode-time
UNDER ACTIVE DEVELOPMENT
Normally, Kaldi decoding graphs are monolithic, require expensive up-front off-line compilation, and are static during decoding. Kaldi's new grammar framework allows multiple independent grammars with nonterminals, to be compiled separately and stitched together dynamically at decode-time, but all the grammars are always active and capable of being recognized.
This project extends that to allow each grammar/rule to be independently marked as active/inactive dynamically on a per-utterance basis (set at the beginning of each utterance). Dragonfly is then capable of activating only the appropriate grammars for the current environment, resulting in increased accuracy due to fewer possible recognitions. Furthermore, the dictation grammar can be shared between all the command grammars, which can be compiled quickly without needing to include large-vocabulary dictation directly.
- The Python package includes all necessary binaries for decoding on Linux or Windows. Available on PyPI.
- A compatible general English Kaldi nnet3 chain model is trained on ~1200 hours of open audio. Available under project releases.
- An improved model is under development.
- A compatible backend for Dragonfly is under development in the
kaldibranch of my fork, and has been merged as of Dragonfly v0.15.0.
- See its documentation, try out a demo, or use the loader to run all normal dragonfly scripts.
- You can try it out easily on Windows using a simple no-install package: see Getting Started below.
- Caster is supported as of KaldiAG v0.6.0.
- Support for KaldiAG v1.0.0 has been merged as of Dragonfly v0.18.0! Improvements include Direct Parsing, Python3, Unicode, Grammar/Rule Weights, Generalized Alternative Dictation, and various bug fixes & optimizations. For details and previous versions' improvements, see project releases.
Donations are appreciated to encourage development.
Want to get started quickly & easily on Windows? Available under project releases:
kaldi-dragonfly-winpython: A self-contained, portable, batteries-included (python & libraries & model) distribution of kaldi-active-grammar + dragonfly2. Just unzip and run!
kaldi-dragonfly-winpython-dev: [more recent development version] A self-contained, portable, batteries-included (python & libraries & model) distribution of kaldi-active-grammar + dragonfly2. Just unzip and run!
kaldi-caster-winpython-dev: [more recent development version] A self-contained, portable, batteries-included (python & libraries & model) distribution of kaldi-active-grammar + dragonfly2 + caster. Just unzip and run!
- Python 2.7 or 3.4+; 64-bit required!
- Microphone support provided by pyaudio package
- OS: Linux or Windows; macOS planned if there is interest
- Only supports Kaldi left-biphone models, specifically nnet3 chain models, with specific modifications
- ~1GB+ disk space for model plus temporary storage and cache, depending on your grammar complexity
- ~500MB+ RAM for model and grammars, depending on your model and grammar complexity
Install Python package, which includes necessary Kaldi binaries:
pip install kaldi-active-grammar
Download compatible generic English Kaldi nnet3 chain model from project releases. Unzip the model and pass the directory path to kaldi-active-grammar constructor.
Or use your own model. Standard Kaldi models must be converted to be usable. Conversion can be performed automatically, but this hasn't been fully implemented yet.
- Errors installing
- Make sure you're using a 64-bit Python.
- Update your
pip install --upgrade pip.
Documentation is sorely lacking currently. To see example usage, examine the backend for Dragonfly.
Issues, suggestions, and feature requests are welcome & encouraged. Pull requests are considered, but project structure is in flux.
Donations are appreciated to encourage development.
- David Zurow (@daanzu)
This project is licensed under the GNU Affero General Public License v3 (AGPL-3.0), with the exception of the associated binaries, whose source is currently unreleased and which are only to be used by this project. See the LICENSE.txt file for details.
If this license is problematic for you, please contact me.
- Based on and including code from Kaldi ASR, under the Apache-2.0 license.
- Code from OpenFST and OpenFST port for Windows, under the Apache-2.0 license.
- Intel Math Kernel Library, copyright (c) 2018 Intel Corporation, under the Intel Simplified Software License.
- Modified generic English Kaldi nnet3 chain model from Zamia Speech, under the LGPL-3.0 license.
Release history Release notifications
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
|Filename, size||File type||Python version||Upload date||Hashes|
|Filename, size kaldi_active_grammar-1.0.2-py2.py3-none-manylinux2010_x86_64.whl (27.2 MB)||File type Wheel||Python version py2.py3||Upload date||Hashes View hashes|
|Filename, size kaldi_active_grammar-1.0.2-py2.py3-none-win_amd64.whl (31.8 MB)||File type Wheel||Python version py2.py3||Upload date||Hashes View hashes|
Hashes for kaldi_active_grammar-1.0.2-py2.py3-none-manylinux2010_x86_64.whl
Hashes for kaldi_active_grammar-1.0.2-py2.py3-none-win_amd64.whl