Skip to main content

FSA/FST algorithms, intended to (eventually) be interoperable with PyTorch and similar

Project description

k2

The vision of k2 is to be able to seamlessly integrate Finite State Automaton (FSA) and Finite State Transducer (FST) algorithms into autograd-based machine learning toolkits like PyTorch and TensorFlow. For speech recognition applications, this should make it easy to interpolate and combine various training objectives such as cross-entropy, CTC and MMI and to jointly optimize a speech recognition system with multiple decoding passes including lattice rescoring and confidence estimation. We hope k2 will have many other applications as well.

One of the key algorithms that we have implemented is pruned composition of a generic FSA with a "dense" FSA (i.e. one that corresponds to log-probs of symbols at the output of a neural network). This can be used as a fast implementation of decoding for ASR, and for CTC and LF-MMI training. This won't give a direct advantage in terms of Word Error Rate when compared with existing technology; but the point is to do this in a much more general and extensible framework to allow further development of ASR technology.

Implementation

A few key points on our implementation strategy.

Most of the code is in C++ and CUDA. We implement a templated class Ragged, which is quite like TensorFlow's RaggedTensor (actually we came up with the design independently, and were later told that TensorFlow was using the same ideas). Despite a close similarity at the level of data structures, the design is quite different from TensorFlow and PyTorch. Most of the time we don't use composition of simple operations, but rely on C++11 lambdas defined directly in the C++ implementations of algorithms. The code in these lambdas operate directly on data pointers and, if the backend is CUDA, they can run in parallel for each element of a tensor. (The C++ and CUDA code is mixed together and the CUDA kernels get instantiated via templates).

It is difficult to adequately describe what we are doing with these Ragged objects without going in detail through the code. The algorithms look very different from the way you would code them on CPU because of the need to avoid sequential processing. We are using coding patterns that make the most expensive parts of the computations "embarrassingly parallelizable"; the only somewhat nontrivial CUDA operations are generally reduction-type operations such as exclusive-prefix-sum, for which we use NVidia's cub library. Our design is not too specific to the NVidia hardware and the bulk of the code we write is fairly normal-looking C++; the nontrivial CUDA programming is mostly done via the cub library, parts of which we wrap with our own convenient interface.

The Finite State Automaton object is then implemented as a Ragged tensor templated on a specific data type (a struct representing an arc in the automaton).

Autograd

If you look at the code as it exists now, you won't find any references to autograd. The design is quite different to TensorFlow and PyTorch (which is why we didn't simply extend one of those toolkits). Instead of making autograd come from the bottom up (by making individual operations differentiable) we are implementing it from the top down, which is much more efficient in this case (and will tend to have better roundoff properties).

An example: suppose we are finding the best path of an FSA, and we need derivatives. We implement this by keeping track of, for each arc in the output best-path, which input arc it corresponds to. (For more complex algorithms an arc in the output might correspond to a sum of probabilities of a list of input arcs). We can make this compatible with PyTorch/TensorFlow autograd at the Python level, by, for example, defining a Function class in PyTorch that remembers this relationship between the arcs and does the appropriate (sparse) operations to propagate back the derivatives w.r.t. the weights.

Current state of the code

We have wrapped all the C++ code to Python with pybind11 and have finished the integration with PyTorch.

We are currently writing speech recognition recipes using k2, which are hosted in a separate repository. Please see https://github.com/k2-fsa/icefall.

Plans after initial release

We are currently trying to make k2 ready for production use (see the branch v2.0-pre).

Quick start

Want to try it out without installing anything? We have setup a Google Colab. You can find more Colab notebooks using k2 in speech recognition at https://icefall.readthedocs.io/en/latest/recipes/librispeech/conformer_ctc.html.

Project details


Release history Release notifications | RSS feed

This version

1.17

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

k2-1.17-py38-none-any.whl (72.7 MB view details)

Uploaded Python 3.8

k2-1.17-py37-none-any.whl (72.7 MB view details)

Uploaded Python 3.7

k2-1.17-py36-none-any.whl (72.7 MB view details)

Uploaded Python 3.6

k2-1.17-cp38-cp38-macosx_10_15_x86_64.whl (1.9 MB view details)

Uploaded CPython 3.8 macOS 10.15+ x86-64

k2-1.17-cp37-cp37m-macosx_10_15_x86_64.whl (1.9 MB view details)

Uploaded CPython 3.7m macOS 10.15+ x86-64

k2-1.17-cp36-cp36m-macosx_10_15_x86_64.whl (1.9 MB view details)

Uploaded CPython 3.6m macOS 10.15+ x86-64

File details

Details for the file k2-1.17-py38-none-any.whl.

File metadata

  • Download URL: k2-1.17-py38-none-any.whl
  • Upload date:
  • Size: 72.7 MB
  • Tags: Python 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.13

File hashes

Hashes for k2-1.17-py38-none-any.whl
Algorithm Hash digest
SHA256 6b9346db3f27bea208a14b3a251796d9a54d75e873f9eab4fd795da2bb24524f
MD5 8b5d2661cd2056d49eec0b838ba9bdd0
BLAKE2b-256 d44c6d06d8953054df5e784d1cf8abe5b507796db3f5cbea661a357f0a3d43db

See more details on using hashes here.

Provenance

File details

Details for the file k2-1.17-py37-none-any.whl.

File metadata

  • Download URL: k2-1.17-py37-none-any.whl
  • Upload date:
  • Size: 72.7 MB
  • Tags: Python 3.7
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.7.13

File hashes

Hashes for k2-1.17-py37-none-any.whl
Algorithm Hash digest
SHA256 381c91f522752a1677b01b15a17ea637e09576b7fe553a0d82f281a36a4aa59f
MD5 31b2699b221f73eaf9b100aaa571d260
BLAKE2b-256 1959582a4d92f63c055a4bb5165d5dbeccdd72de3765066b1df032628b25df09

See more details on using hashes here.

Provenance

File details

Details for the file k2-1.17-py36-none-any.whl.

File metadata

  • Download URL: k2-1.17-py36-none-any.whl
  • Upload date:
  • Size: 72.7 MB
  • Tags: Python 3.6
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.3 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.64.0 importlib-metadata/4.8.3 keyring/23.4.1 rfc3986/1.5.0 colorama/0.4.5 CPython/3.6.15

File hashes

Hashes for k2-1.17-py36-none-any.whl
Algorithm Hash digest
SHA256 bd564d91e2fe8d2596f5f1cd83f1c30c7d07c67a068f9867e3599ce85e08abfd
MD5 8ea06d4b6b9a575099c853e1658024bf
BLAKE2b-256 ad249c54339c53126709f39d2a5bf0831aa53fecf3610e16d1c029045f7a6b5a

See more details on using hashes here.

Provenance

File details

Details for the file k2-1.17-cp38-cp38-macosx_10_15_x86_64.whl.

File metadata

File hashes

Hashes for k2-1.17-cp38-cp38-macosx_10_15_x86_64.whl
Algorithm Hash digest
SHA256 eb5862d9140afd085fdf05281005902f659d6ac045ee6d8ab40fb73956fe50db
MD5 51c0b2000cb995c51bc721c1e417555c
BLAKE2b-256 48937e62ff6147a6d7dc2bf8829e144924e82794f91f016e59be629e174bdff9

See more details on using hashes here.

Provenance

File details

Details for the file k2-1.17-cp37-cp37m-macosx_10_15_x86_64.whl.

File metadata

File hashes

Hashes for k2-1.17-cp37-cp37m-macosx_10_15_x86_64.whl
Algorithm Hash digest
SHA256 a07c0fee3604545031b21e9d9da0b3369894987a830a75f7754eef7f7b7ef0de
MD5 e7c42d0f2a9a7c13030d9f5cae70bc57
BLAKE2b-256 208e9002df78ae92705ed589b625948160dcd86f632397fbb8d3615a78909789

See more details on using hashes here.

Provenance

File details

Details for the file k2-1.17-cp36-cp36m-macosx_10_15_x86_64.whl.

File metadata

  • Download URL: k2-1.17-cp36-cp36m-macosx_10_15_x86_64.whl
  • Upload date:
  • Size: 1.9 MB
  • Tags: CPython 3.6m, macOS 10.15+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.3 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.64.0 importlib-metadata/4.8.3 keyring/23.4.1 rfc3986/1.5.0 colorama/0.4.5 CPython/3.6.15

File hashes

Hashes for k2-1.17-cp36-cp36m-macosx_10_15_x86_64.whl
Algorithm Hash digest
SHA256 21b929534209eccf455bdd43e74b87753578e5df42f84cd3da178703f16fa6b9
MD5 1339d8a4f677c972560cd1cb21abe0c9
BLAKE2b-256 5c39d92a90b425b62b5ccdcfea1bc0560fff1eac3d19c2f8b660e103da37758e

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page