Skip to main content

Swig bindings for kaldi

Project description

===============
pydrobert-kaldi
===============

|travis| |appveyor| |readthedocs|

`Read the latest docs <http://pydrobert-kaldi.readthedocs.io/en/latest>`_

**This is student-driven code, so don't expect a stable API. I'll try to use
semantic versioning, but the best way to keep functionality stable is by
forking.**

What is it?
-----------

Some Kaldi_ SWIG_ bindings for Python. I started this project because I wanted
to seamlessly incorporate Kaldi's I/O mechanism into the gamut of Python-based
data science packages (e.g. Theano, Tensorflow, CNTK, PyTorch, etc.). The code
base is expanding to wrap more of Kaldi's feature processing and mathematical
functions, but is unlikely to include modelling or decoding.

Eventually, I plan on adding hooks for Kaldi audio features and pre-/post-
processing. However, I have no plans on porting any code involving modelling or
decoding.

Input/Output
------------

Most I/O can be performed with the ``pydrobert.kaldi.io.open`` function:

>>> from pydrobert.kaldi import io
>>> with io.open('scp:foo.scp', 'bm') as f:
>>> for matrix in f:
>>> pass # do something

``open`` is a factory function that determines the appropriate underlying
stream to open, much like Python's built-in ``open``. The data types we can
read (here, a ``BaseMatrix``) are listed in
``pydrobert.kaldi.io.enums.KaldiDataType``. Big data types, like matrices and
vectors, are piped into Numpy_ arrays. Passing an extended filename (e.g.
paths to files on discs, ``'-'`` for stdin/stdout, ``'gzip -c a.ark.gz |'``,
etc.) opens a stream from which data types can be read one-by-one and in the
order they were written. Alternatively, prepending the extended filename with
``'ark[,[option_a[,option_b...]]:'`` or ``'scp[,...]:'`` and specifying a data
type allows one to open a Kaldi table for iterator-like sequential reading
(``mode='r'``), dict-like random access reading (``mode='r+'``), or writing
(``mode='w'``). For more information on the open function, consult the
docstring. Information on `Kaldi I/O`_ can be found on their website.

The submodule ``pydrobert.kaldi.io.corpus`` contains useful wrappers around
Kaldi I/O to serve up batches of data to, say, a neural network:

>>> train = ShuffledData('scp:feats.scp', 'scp:labels.scp', batch_size=10)
>>> for feat_batch, label_batch in train:
>>> pass # do something

Logging and CLI
---------------

By default, Kaldi error, warning, and critical messages are piped to standard
error. The ``pydrobert.kaldi.logging`` submodule provides hooks into python's
native logging interface: the ``logging`` module. The ``KaldiLogger`` can
handle stack traces from Kaldi C++ code, and there are a variety of decorators
to finagle the kaldi logging patterns to python logging patterns, or vice
versa.

You'd likely want to explicitly handle logging when creating new kaldi-style
commands for command line. ``pydrobert.kaldi.io.argparse`` provides
``KaldiParser``, an ``ArgumentParser`` tailored to Kaldi inputs/outputs. It is
used by a few command-line entry points added by this package. They are:

write-table-to-pickle
Write the contents of a kaldi table to a pickle file(s). Good for late night
attempts at reaching a paper deadline.
write-pickle-to-table
Write the contents of of a pickle file(s) to a kaldi table.
normalize-feat-lens
Ensure that features have the same length as some reference by truncating
or appending frames.
compute-error-rate
Compute an error rate between reference and hypothesis texts, such as a WER
or PER.

Installation
------------

Check the following compatibility table to see if you can get a PyPI_ or Conda_
install going:

+----------+------+--------+--------+-------+
| Platform | Arch | Python | Conda? | PyPI? |
+==========+======+========+========+=======+
| Windows | 32 | 2.7 | No | No |
+----------+------+--------+--------+-------+
| Windows | 32 | 3.5 | Yes | No |
+----------+------+--------+--------+-------+
| Windows | 32 | 3.6 | Yes | No |
+----------+------+--------+--------+-------+
| Windows | 64 | 2.7 | No | No |
+----------+------+--------+--------+-------+
| Windows | 64 | 3.5 | Yes | No |
+----------+------+--------+--------+-------+
| Windows | 64 | 3.6 | Yes | No |
+----------+------+--------+--------+-------+
| OSX | 32 | - | No | No |
+----------+------+--------+--------+-------+
| OSX | 64 | 2.7 | Yes | Yes |
+----------+------+--------+--------+-------+
| OSX | 64 | 3.5 | Yes | Yes |
+----------+------+--------+--------+-------+
| OSX | 64 | 3.6 | Yes | Yes |
+----------+------+--------+--------+-------+
| Linux | 32 | 2.7 | Yes | Yes |
+----------+------+--------+--------+-------+
| Linux | 32 | 3.5 | Yes | Yes |
+----------+------+--------+--------+-------+
| Linux | 32 | 3.6 | Yes | Yes |
+----------+------+--------+--------+-------+
| Linux | 64 | 2.7 | Yes | Yes |
+----------+------+--------+--------+-------+
| Linux | 64 | 3.5 | Yes | Yes |
+----------+------+--------+--------+-------+
| Linux | 64 | 3.6 | Yes | Yes |
+----------+------+--------+--------+-------+

To install via ``conda``::

conda install -c sdrobert pydrobert-kaldi

To install via ``pip``::

pip install pydrobert-kaldi

You can also try building from source, but you'll have to specify where your
BLAS install is somehow.

# for OpenBLAS
OPENBLASROOT=/path/to/openblas/install pip install \
git+https://github.com/sdrobert/pydrobert-kaldi.git
# for MKL
MKLROOT=/path/to/mkl/install pip install \
git+https://github.com/sdrobert/pydrobert-kaldi.git
# see setup.py for more options

License
-------

This code is licensed under Apache 2.0.

Code found under the ``src/`` directory has been primarily copied from Kaldi.
``setup.py`` is also strongly influenced by Kaldi's build
configuration. Kaldi is also covered by the Apache 2.0 license; its specific
license file was copied into ``src/COPYING_Kaldi_Project`` to live among its
fellows.

.. _Kaldi: http://kaldi-asr.org/
.. _`Kaldi I/O`: http://kaldi-asr.org/doc/io.html
.. _Swig: http://www.swig.org/
.. _Numpy: http://www.numpy.org/
.. _Conda: http://conda.pydata.org/docs/
.. _PyPI: https://pypi.org/
.. |travis| image:: https://travis-ci.org/sdrobert/pydrobert-kaldi.svg?branch=master
:target: https://travis-ci.org/sdrobert/pydrobert-kaldi
:alt: Travis Build Status
.. |appveyor| image:: https://ci.appveyor.com/api/projects/status/lvjhj9pgju90wn8j/branch/master?svg=true
:target: https://ci.appveyor.com/project/sdrobert/pydrobert-kaldi
:alt: AppVeyor Build Status
.. |readthedocs| image:: https://readthedocs.org/projects/pydrobert-kaldi/badge/?version=latest
:target: http://pydrobert-kaldi.readthedocs.io/en/latest
:alt: Documentation Status


Project details


Release history Release notifications

This version
History Node

0.5.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
pydrobert_kaldi-0.5.0-cp27-cp27m-macosx_10_12_x86_64.whl (1.2 MB) Copy SHA256 hash SHA256 Wheel cp27 Feb 4, 2018
pydrobert_kaldi-0.5.0-cp27-cp27m-manylinux1_i686.whl (17.8 MB) Copy SHA256 hash SHA256 Wheel cp27 Feb 4, 2018
pydrobert_kaldi-0.5.0-cp27-cp27m-manylinux1_x86_64.whl (20.1 MB) Copy SHA256 hash SHA256 Wheel cp27 Feb 4, 2018
pydrobert_kaldi-0.5.0-cp27-cp27mu-manylinux1_i686.whl (17.8 MB) Copy SHA256 hash SHA256 Wheel cp27 Feb 4, 2018
pydrobert_kaldi-0.5.0-cp27-cp27mu-manylinux1_x86_64.whl (20.1 MB) Copy SHA256 hash SHA256 Wheel cp27 Feb 4, 2018
pydrobert_kaldi-0.5.0-cp35-cp35m-macosx_10_12_x86_64.whl (1.2 MB) Copy SHA256 hash SHA256 Wheel cp35 Feb 4, 2018
pydrobert_kaldi-0.5.0-cp35-cp35m-manylinux1_i686.whl (17.8 MB) Copy SHA256 hash SHA256 Wheel cp35 Feb 4, 2018
pydrobert_kaldi-0.5.0-cp35-cp35m-manylinux1_x86_64.whl (20.1 MB) Copy SHA256 hash SHA256 Wheel cp35 Feb 4, 2018
pydrobert_kaldi-0.5.0-cp36-cp36m-macosx_10_12_x86_64.whl (1.2 MB) Copy SHA256 hash SHA256 Wheel cp36 Feb 4, 2018
pydrobert_kaldi-0.5.0-cp36-cp36m-manylinux1_i686.whl (17.8 MB) Copy SHA256 hash SHA256 Wheel cp36 Feb 4, 2018
pydrobert_kaldi-0.5.0-cp36-cp36m-manylinux1_x86_64.whl (20.1 MB) Copy SHA256 hash SHA256 Wheel cp36 Feb 4, 2018
pydrobert-kaldi-0.5.0.tar.gz (512.4 kB) Copy SHA256 hash SHA256 Source None Feb 4, 2018

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page