Swig bindings for kaldi
Project description
This is student-driven code, so don’t expect a stable API. I’ll try to use semantic versioning, but the best way to keep functionality stable is by forking.
What is it?
Some Kaldi SWIG bindings for Python. I started this project because I wanted to seamlessly incorporate Kaldi’s I/O mechanism into the gamut of Python-based data science packages (e.g. Theano, Tensorflow, CNTK, PyTorch, etc.). The code base is expanding to wrap more of Kaldi’s feature processing and mathematical functions, but is unlikely to include modelling or decoding.
Eventually, I plan on adding hooks for Kaldi audio features and pre-/post- processing. However, I have no plans on porting any code involving modelling or decoding.
Input/Output
Most I/O can be performed with the pydrobert.kaldi.io.open function:
>>> from pydrobert.kaldi import io >>> with io.open('scp:foo.scp', 'bm') as f: >>> for matrix in f: >>> pass # do something
open is a factory function that determines the appropriate underlying stream to open, much like Python’s built-in open. The data types we can read (here, a BaseMatrix) are listed in pydrobert.kaldi.io.enums.KaldiDataType. Big data types, like matrices and vectors, are piped into Numpy arrays. Passing an extended filename (e.g. paths to files on discs, '-' for stdin/stdout, 'gzip -c a.ark.gz |', etc.) opens a stream from which data types can be read one-by-one and in the order they were written. Alternatively, prepending the extended filename with 'ark[,[option_a[,option_b...]]:' or 'scp[,...]:' and specifying a data type allows one to open a Kaldi table for iterator-like sequential reading (mode='r'), dict-like random access reading (mode='r+'), or writing (mode='w'). For more information on the open function, consult the docstring. Information on Kaldi I/O can be found on their website.
The submodule pydrobert.kaldi.io.corpus contains useful wrappers around Kaldi I/O to serve up batches of data to, say, a neural network:
>>> train = ShuffledData('scp:feats.scp', 'scp:labels.scp', batch_size=10) >>> for feat_batch, label_batch in train: >>> pass # do something
Logging and CLI
By default, Kaldi error, warning, and critical messages are piped to standard error. The pydrobert.kaldi.logging submodule provides hooks into python’s native logging interface: the logging module. The KaldiLogger can handle stack traces from Kaldi C++ code, and there are a variety of decorators to finagle the kaldi logging patterns to python logging patterns, or vice versa.
You’d likely want to explicitly handle logging when creating new kaldi-style commands for command line. pydrobert.kaldi.io.argparse provides KaldiParser, an ArgumentParser tailored to Kaldi inputs/outputs. It is used by a few command-line entry points added by this package. They are:
- write-table-to-pickle
Write the contents of a kaldi table to a pickle file(s). Good for late night attempts at reaching a paper deadline.
- write-pickle-to-table
Write the contents of of a pickle file(s) to a kaldi table.
- normalize-feat-lens
Ensure that features have the same length as some reference by truncating or appending frames.
- compute-error-rate
Compute an error rate between reference and hypothesis texts, such as a WER or PER.
- normalize-feat-lens
Ensure features match some reference length, either by padding or clipping the end.
Installation
Check the following compatibility table to see if you can get a PyPI or Conda install going:
Platform |
Arch |
Python |
Conda? |
PyPI? |
---|---|---|---|---|
Windows |
32 |
2.7 |
No |
No |
Windows |
32 |
3.4 |
Yes |
No |
Windows |
32 |
3.5 |
Yes |
No |
Windows |
32 |
3.6 |
Yes |
No |
Windows |
32 |
3.7 |
Yes |
No |
Windows |
64 |
2.7 |
No |
No |
Windows |
64 |
3.5 |
Yes |
No |
Windows |
64 |
3.6 |
Yes |
No |
Windows |
64 |
3.7 |
Yes |
No |
OSX |
32 |
No |
No |
|
OSX |
64 |
2.7 |
Yes |
Yes |
OSX |
64 |
3.4 |
Yes |
Yes |
OSX |
64 |
3.5 |
Yes |
Yes |
OSX |
64 |
3.6 |
Yes |
Yes |
OSX |
64 |
3.7 |
Yes |
Yes |
Linux |
32 |
2.7 |
Yes |
Yes |
Linux |
32 |
3.4 |
Yes |
Yes |
Linux |
32 |
3.5 |
Yes |
Yes |
Linux |
32 |
3.6 |
Yes |
Yes |
Linux |
32 |
3.7 |
Yes |
Yes |
Linux |
64 |
2.7 |
Yes |
Yes |
Linux |
64 |
3.4 |
Yes |
Yes |
Linux |
64 |
3.5 |
Yes |
Yes |
Linux |
64 |
3.6 |
Yes |
Yes |
Linux |
64 |
3.7 |
Yes |
Yes |
To install via conda:
conda install -c sdrobert pydrobert-kaldi
To install via pip:
pip install pydrobert-kaldi
You can also try building from source, but you’ll have to specify where your BLAS install is somehow:
# for OpenBLAS OPENBLASROOT=/path/to/openblas/install pip install \ git+https://github.com/sdrobert/pydrobert-kaldi.git # for MKL MKLROOT=/path/to/mkl/install pip install \ git+https://github.com/sdrobert/pydrobert-kaldi.git # see setup.py for more options
License
This code is licensed under Apache 2.0.
Code found under the src/ directory has been primarily copied from Kaldi. setup.py is also strongly influenced by Kaldi’s build configuration. Kaldi is also covered by the Apache 2.0 license; its specific license file was copied into src/COPYING_Kaldi_Project to live among its fellows.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for pydrobert-kaldi-0.5.post1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4b7a1d7cd91062cdcd732f7cf4d763a343bb4a38a5640f5c459db602dcd97590 |
|
MD5 | 36889fa79aeccd1487489e028fbce225 |
|
BLAKE2b-256 | e143abb09f315c4d2f3c2b46e233e197966b619c5230e43a9c6562989d99c7c9 |
Hashes for pydrobert_kaldi-0.5.post1-cp37-cp37m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ad3b00302f990dce9b710c81ed0e1d78746922088976caac543c799d6cd169cd |
|
MD5 | 7cf019c6810349b00fce33d9bf0447bf |
|
BLAKE2b-256 | 53ccf3cab2d5aeb234cd7869ec7a2d3b38d666d5fab4eb276ca6565b4fb08a1e |
Hashes for pydrobert_kaldi-0.5.post1-cp37-cp37m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | da8a476d949d77f20ffc3e66efb6d7cedff7f9f69b0405a78f3e3e2249fb4c79 |
|
MD5 | c4fb60e6247e22ae2ba50946945347df |
|
BLAKE2b-256 | 907d7f8f5fd922f2bf680a09e49506364ae338ee6be3116b6952c511f30c0ee2 |
Hashes for pydrobert_kaldi-0.5.post1-cp37-cp37m-macosx_10_13_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e5687d3bcaa0709774fc1ce959cf6031c03fd5010a4c2e6279f5fe654d9d2404 |
|
MD5 | 0376760753e2e3e0779e4514b0129488 |
|
BLAKE2b-256 | deb3ad08318f32e4c0198bcb91298d9cb933598a6d591f70816a7115ce1d5633 |
Hashes for pydrobert_kaldi-0.5.post1-cp36-cp36m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 57bfb8a2141beb47e23ca1dd44b7f5ec19ffac21a862088f0d4707cb06c6d1bc |
|
MD5 | 607cf70d6e5c0cee8c3f3827561d09a1 |
|
BLAKE2b-256 | a85f013f8fc6579548075f94a03f2a46ea2b388981d9d14e65123058507a345b |
Hashes for pydrobert_kaldi-0.5.post1-cp36-cp36m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0aa63a95e44bc3dc5bb58b8410e6eae31a0b3ad4eda2b849b0cd3a4a22e1c04d |
|
MD5 | 913a7dc21e2ecbea02bf5820d87b796b |
|
BLAKE2b-256 | af219a965b95c5e2980c95d7a8a4ef4110812d3b728da09878abf6a6446a3d9a |
Hashes for pydrobert_kaldi-0.5.post1-cp36-cp36m-macosx_10_13_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b3828af2be74e5b80f23d658c17a0ef36a2df5f6c59e47c6b8caae8a398ccc39 |
|
MD5 | e525361cec10cfd6ced6bc3567038eff |
|
BLAKE2b-256 | 759b9dac676f5ac7913ddc79a4c88f942dbc1aab9677c9676a20b2589006844d |
Hashes for pydrobert_kaldi-0.5.post1-cp35-cp35m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 96bff6a11440ae6e445a8b9c3a29ab495f9aa2eff1943aa71a366dafd64487b0 |
|
MD5 | 9ff236a9083dbdf8ac11313ffdfffd3a |
|
BLAKE2b-256 | de1a6133aaddf31118a918f13de155c3347006ea59abfe07371c9dcaa96c7558 |
Hashes for pydrobert_kaldi-0.5.post1-cp35-cp35m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8975b90a6f0800da32938ff2bededf40da6d3a13a1bf6a94b2d352d42cbf0cef |
|
MD5 | 52b291def899234e1bea05f88203b4b9 |
|
BLAKE2b-256 | ad4b8200891860a921ec931e210bc911211bdd338ba68d3a35d6835c7d1ed26a |
Hashes for pydrobert_kaldi-0.5.post1-cp35-cp35m-macosx_10_13_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 521b53d15a48d34478f5cca8652dac02922c9fbea7149f6a8df527a4d0800ebc |
|
MD5 | 4d980c4118261fc09a99a82c8a8e067d |
|
BLAKE2b-256 | 52f3e60f71a64cca43b3cf8d5d1d4b9f7b1e21529a142352ec1c15876946e7c1 |
Hashes for pydrobert_kaldi-0.5.post1-cp34-cp34m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ee9dbc2bc7684598d268e7c6eaaa6ee61119b4343408c4389dd66e4332904111 |
|
MD5 | 89788d77fad5ee15b516f252a146bc58 |
|
BLAKE2b-256 | 9ab900efd155d7361efc6173a6047a188f9e7b92fdeda92b424827a8957be09d |
Hashes for pydrobert_kaldi-0.5.post1-cp34-cp34m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a30e242a5e9e3fdc091ab3301a4d3e2fa38f5e3a39255d9ef36aa5be5a4edcf6 |
|
MD5 | ca6fa8106228f046d055ea9b4f9e0c93 |
|
BLAKE2b-256 | c433da7ad9327e9bd9fef40c1a3c8675408c4aa02e0d5a0afae6bdc6017fbfc6 |
Hashes for pydrobert_kaldi-0.5.post1-cp34-cp34m-macosx_10_13_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 35ed541d40870a59ffcb279fde2fa0a6f53b2a4ef333a51c9e2ec2d5cea68e7a |
|
MD5 | 220650fae59a8f78b7cfc044b452200d |
|
BLAKE2b-256 | 5b20fd8aab644e58654eddd4da12d2a5d9f5cd3cc3855ebfff1d507f3587bb26 |
Hashes for pydrobert_kaldi-0.5.post1-cp27-cp27mu-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5206cac98e3ff11d661f5a4ba886b053b15f8f33127d040bcb4e86ef936bda48 |
|
MD5 | 54ee8ddf3a61f3d6cdf53c91bdf7eac2 |
|
BLAKE2b-256 | 99ec4bdf281389438632815a0f62be55921caea17ba686287d42c22e07f48139 |
Hashes for pydrobert_kaldi-0.5.post1-cp27-cp27mu-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 213124e550455b3798f87d530fb01bffea784bcfab2fddbfe6fbd2cfaf9b3d64 |
|
MD5 | 25f20faa13c0574e14ed7a03814f8288 |
|
BLAKE2b-256 | 300bf57d3be9aecc7a8bdab5d488f29365e7dd9d2f33a8ba325ab73f09a64c13 |
Hashes for pydrobert_kaldi-0.5.post1-cp27-cp27m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 719ef9a12dfc457468b190a1e6ab691c5be3d8669685fc5f145db1555c5ad352 |
|
MD5 | 7173363b1fe28b50853975282c2b7330 |
|
BLAKE2b-256 | 0136f7846ef84765a88ddbb310b9855fac2c330076bac20094cdf0e36a253142 |
Hashes for pydrobert_kaldi-0.5.post1-cp27-cp27m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | da2e8052ebc65f538cc1d4a670c177965a2c648b42aab6bfd29ae1390b0501fa |
|
MD5 | 4b99f15d4da5d60fdb749e6304d0a8ed |
|
BLAKE2b-256 | 42227680f5a7866cc7e48e437a948e55ed5a3600645bfd737c2b7e41d5ffb6ea |
Hashes for pydrobert_kaldi-0.5.post1-cp27-cp27m-macosx_10_13_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 36075250bff169064b3465278dda08dbbff381f7a3d9f3bedc35638cfbc0761f |
|
MD5 | 8f45de7bf87bfb36b49ab48793459f3a |
|
BLAKE2b-256 | a5610f2500e6160016b8862205779dacc8ec38fe6464cd9ed4905eeb27757b65 |