Skip to main content

FSA/FST algorithms, intended to (eventually) be interoperable with PyTorch and similar

Project description

k2

The vision of k2 is to be able to seamlessly integrate Finite State Automaton (FSA) and Finite State Transducer (FST) algorithms into autograd-based machine learning toolkits like PyTorch and TensorFlow. For speech recognition applications, this should make it easy to interpolate and combine various training objectives such as cross-entropy, CTC and MMI and to jointly optimize a speech recognition system with multiple decoding passes including lattice rescoring and confidence estimation. We hope k2 will have many other applications as well.

One of the key algorithms that we have implemented is pruned composition of a generic FSA with a "dense" FSA (i.e. one that corresponds to log-probs of symbols at the output of a neural network). This can be used as a fast implementation of decoding for ASR, and for CTC and LF-MMI training. This won't give a direct advantage in terms of Word Error Rate when compared with existing technology; but the point is to do this in a much more general and extensible framework to allow further development of ASR technology.

Implementation

A few key points on our implementation strategy.

Most of the code is in C++ and CUDA. We implement a templated class Ragged, which is quite like TensorFlow's RaggedTensor (actually we came up with the design independently, and were later told that TensorFlow was using the same ideas). Despite a close similarity at the level of data structures, the design is quite different from TensorFlow and PyTorch. Most of the time we don't use composition of simple operations, but rely on C++11 lambdas defined directly in the C++ implementations of algorithms. The code in these lambdas operate directly on data pointers and, if the backend is CUDA, they can run in parallel for each element of a tensor. (The C++ and CUDA code is mixed together and the CUDA kernels get instantiated via templates).

It is difficult to adequately describe what we are doing with these Ragged objects without going in detail through the code. The algorithms look very different from the way you would code them on CPU because of the need to avoid sequential processing. We are using coding patterns that make the most expensive parts of the computations "embarrassingly parallelizable"; the only somewhat nontrivial CUDA operations are generally reduction-type operations such as exclusive-prefix-sum, for which we use NVidia's cub library. Our design is not too specific to the NVidia hardware and the bulk of the code we write is fairly normal-looking C++; the nontrivial CUDA programming is mostly done via the cub library, parts of which we wrap with our own convenient interface.

The Finite State Automaton object is then implemented as a Ragged tensor templated on a specific data type (a struct representing an arc in the automaton).

Autograd

If you look at the code as it exists now, you won't find any references to autograd. The design is quite different to TensorFlow and PyTorch (which is why we didn't simply extend one of those toolkits). Instead of making autograd come from the bottom up (by making individual operations differentiable) we are implementing it from the top down, which is much more efficient in this case (and will tend to have better roundoff properties).

An example: suppose we are finding the best path of an FSA, and we need derivatives. We implement this by keeping track of, for each arc in the output best-path, which input arc it corresponds to. (For more complex algorithms an arc in the output might correspond to a sum of probabilities of a list of input arcs). We can make this compatible with PyTorch/TensorFlow autograd at the Python level, by, for example, defining a Function class in PyTorch that remembers this relationship between the arcs and does the appropriate (sparse) operations to propagate back the derivatives w.r.t. the weights.

Current state of the code

We have wrapped all the C++ code to Python with pybind11 and have finished the integration with PyTorch.

We are currently writing speech recognition recipes using k2, which are hosted in a separate repository. Please see https://github.com/k2-fsa/icefall.

Plans after initial release

We are currently trying to make k2 ready for production use (see the branch v2.0-pre).

Quick start

Want to try it out without installing anything? We have setup a Google Colab. You can find more Colab notebooks using k2 in speech recognition at https://icefall.readthedocs.io/en/latest/recipes/librispeech/conformer_ctc.html.

Project details


Release history Release notifications | RSS feed

This version

1.12

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

k2-1.12-py38-none-any.whl (60.1 MB view details)

Uploaded Python 3.8

k2-1.12-py37-none-any.whl (60.1 MB view details)

Uploaded Python 3.7

k2-1.12-py36-none-any.whl (60.1 MB view details)

Uploaded Python 3.6

k2-1.12-cp38-cp38-macosx_10_15_x86_64.whl (1.7 MB view details)

Uploaded CPython 3.8macOS 10.15+ x86-64

k2-1.12-cp37-cp37m-macosx_10_15_x86_64.whl (1.7 MB view details)

Uploaded CPython 3.7mmacOS 10.15+ x86-64

k2-1.12-cp36-cp36m-macosx_10_15_x86_64.whl (1.7 MB view details)

Uploaded CPython 3.6mmacOS 10.15+ x86-64

File details

Details for the file k2-1.12-py38-none-any.whl.

File metadata

  • Download URL: k2-1.12-py38-none-any.whl
  • Upload date:
  • Size: 60.1 MB
  • Tags: Python 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for k2-1.12-py38-none-any.whl
Algorithm Hash digest
SHA256 07d1175dff512d93492e8399f8f48d007a25bdb7f610478a9d101b4b66b3be8f
MD5 78f982e55fb990a44c420bf8f753f59a
BLAKE2b-256 31da2db0c2ffd15d535d3185eaf7dee98ae76f37cdfb11ff9bde9ffc19a8bc91

See more details on using hashes here.

File details

Details for the file k2-1.12-py37-none-any.whl.

File metadata

  • Download URL: k2-1.12-py37-none-any.whl
  • Upload date:
  • Size: 60.1 MB
  • Tags: Python 3.7
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.12

File hashes

Hashes for k2-1.12-py37-none-any.whl
Algorithm Hash digest
SHA256 3e284c12a69158d2570d8a2acab4a0223ff77f38c3641323f887251257a65095
MD5 06852b04f6878d78caf42b8c1fe794fc
BLAKE2b-256 ea1c263a460112845ac726f9c5d9f5cbb20cce5b10769be3b7c74f969e0fd091

See more details on using hashes here.

File details

Details for the file k2-1.12-py36-none-any.whl.

File metadata

  • Download URL: k2-1.12-py36-none-any.whl
  • Upload date:
  • Size: 60.1 MB
  • Tags: Python 3.6
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.8.3 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.6.15

File hashes

Hashes for k2-1.12-py36-none-any.whl
Algorithm Hash digest
SHA256 c9e80352d0c47f64c7b7e36dc0b6401b94eb8f334a60e44b2e15656552ec5cb1
MD5 6fe012b2c3c9c4896b7361439654d287
BLAKE2b-256 4d4fc33942f689af9e33bb8533b9fa9c3ed3e6991bf550eee707306aa1dd7c2d

See more details on using hashes here.

File details

Details for the file k2-1.12-cp38-cp38-macosx_10_15_x86_64.whl.

File metadata

  • Download URL: k2-1.12-cp38-cp38-macosx_10_15_x86_64.whl
  • Upload date:
  • Size: 1.7 MB
  • Tags: CPython 3.8, macOS 10.15+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for k2-1.12-cp38-cp38-macosx_10_15_x86_64.whl
Algorithm Hash digest
SHA256 2fe5d393360af10baf9b0d00e974d6498b072d3cf655b0183b8920e11a005de6
MD5 e0fc8f04994e67ce8bf8973ce876d894
BLAKE2b-256 9080baa5f600b50908c0d86e67b3e7f9463a8bf22d3626c7efc487313c985ca3

See more details on using hashes here.

File details

Details for the file k2-1.12-cp37-cp37m-macosx_10_15_x86_64.whl.

File metadata

  • Download URL: k2-1.12-cp37-cp37m-macosx_10_15_x86_64.whl
  • Upload date:
  • Size: 1.7 MB
  • Tags: CPython 3.7m, macOS 10.15+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.12

File hashes

Hashes for k2-1.12-cp37-cp37m-macosx_10_15_x86_64.whl
Algorithm Hash digest
SHA256 127f23c0c17144046a4dd4012eab462e202aba8c45c844a643015201eea35c8f
MD5 947877ca95267e01a117a0b8b04068f3
BLAKE2b-256 7bfed29495fef390cbda93ed65b3e336d0b6c9b9106e69a83b6d7e8490341eba

See more details on using hashes here.

File details

Details for the file k2-1.12-cp36-cp36m-macosx_10_15_x86_64.whl.

File metadata

  • Download URL: k2-1.12-cp36-cp36m-macosx_10_15_x86_64.whl
  • Upload date:
  • Size: 1.7 MB
  • Tags: CPython 3.6m, macOS 10.15+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.8.3 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.6.15

File hashes

Hashes for k2-1.12-cp36-cp36m-macosx_10_15_x86_64.whl
Algorithm Hash digest
SHA256 a48d7dc57d1a67010a7d4f5a8375778e7177e6fb89c3366718ae540f414b7e04
MD5 4ff52315856744978aa7835f86cac012
BLAKE2b-256 cad1d39fcc8c3cd0fe50cf53a7c87450b37e3149733dbaf7e37860e065d56c7c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page