Skip to main content

FSA/FST algorithms, intended to (eventually) be interoperable with PyTorch and similar

Project description

k2

The vision of k2 is to be able to seamlessly integrate Finite State Automaton (FSA) and Finite State Transducer (FST) algorithms into autograd-based machine learning toolkits like PyTorch and TensorFlow. For speech recognition applications, this should make it easy to interpolate and combine various training objectives such as cross-entropy, CTC and MMI and to jointly optimize a speech recognition system with multiple decoding passes including lattice rescoring and confidence estimation. We hope k2 will have many other applications as well.

One of the key algorithms that we have implemented is pruned composition of a generic FSA with a "dense" FSA (i.e. one that corresponds to log-probs of symbols at the output of a neural network). This can be used as a fast implementation of decoding for ASR, and for CTC and LF-MMI training. This won't give a direct advantage in terms of Word Error Rate when compared with existing technology; but the point is to do this in a much more general and extensible framework to allow further development of ASR technology.

Implementation

A few key points on our implementation strategy.

Most of the code is in C++ and CUDA. We implement a templated class Ragged, which is quite like TensorFlow's RaggedTensor (actually we came up with the design independently, and were later told that TensorFlow was using the same ideas). Despite a close similarity at the level of data structures, the design is quite different from TensorFlow and PyTorch. Most of the time we don't use composition of simple operations, but rely on C++11 lambdas defined directly in the C++ implementations of algorithms. The code in these lambdas operate directly on data pointers and, if the backend is CUDA, they can run in parallel for each element of a tensor. (The C++ and CUDA code is mixed together and the CUDA kernels get instantiated via templates).

It is difficult to adequately describe what we are doing with these Ragged objects without going in detail through the code. The algorithms look very different from the way you would code them on CPU because of the need to avoid sequential processing. We are using coding patterns that make the most expensive parts of the computations "embarrassingly parallelizable"; the only somewhat nontrivial CUDA operations are generally reduction-type operations such as exclusive-prefix-sum, for which we use NVidia's cub library. Our design is not too specific to the NVidia hardware and the bulk of the code we write is fairly normal-looking C++; the nontrivial CUDA programming is mostly done via the cub library, parts of which we wrap with our own convenient interface.

The Finite State Automaton object is then implemented as a Ragged tensor templated on a specific data type (a struct representing an arc in the automaton).

Autograd

If you look at the code as it exists now, you won't find any references to autograd. The design is quite different to TensorFlow and PyTorch (which is why we didn't simply extend one of those toolkits). Instead of making autograd come from the bottom up (by making individual operations differentiable) we are implementing it from the top down, which is much more efficient in this case (and will tend to have better roundoff properties).

An example: suppose we are finding the best path of an FSA, and we need derivatives. We implement this by keeping track of, for each arc in the output best-path, which input arc it corresponds to. (For more complex algorithms an arc in the output might correspond to a sum of probabilities of a list of input arcs). We can make this compatible with PyTorch/TensorFlow autograd at the Python level, by, for example, defining a Function class in PyTorch that remembers this relationship between the arcs and does the appropriate (sparse) operations to propagate back the derivatives w.r.t. the weights.

Current state of the code

We have wrapped all the C++ code to Python with pybind11 and have finished the integration with PyTorch.

We are currently writing speech recognition recipes using k2, which are hosted in a separate repository. Please see https://github.com/k2-fsa/icefall.

Plans after initial release

We are currently trying to make k2 ready for production use (see the branch v2.0-pre).

Quick start

Want to try it out without installing anything? We have setup a Google Colab. You can find more Colab notebooks using k2 in speech recognition at https://icefall.readthedocs.io/en/latest/recipes/librispeech/conformer_ctc.html.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

k2-1.23.4-py3.11-none-any.whl (103.1 MB view details)

Uploaded Python 3

k2-1.23.4-py3.10-none-any.whl (103.1 MB view details)

Uploaded Python 3

k2-1.23.4-py3.9-none-any.whl (103.1 MB view details)

Uploaded Python 3

k2-1.23.4-py3.8-none-any.whl (103.1 MB view details)

Uploaded Python 3

k2-1.23.4-py3.7-none-any.whl (103.1 MB view details)

Uploaded Python 3

k2-1.23.4-cp310-cp310-macosx_10_15_x86_64.whl (2.4 MB view details)

Uploaded CPython 3.10 macOS 10.15+ x86-64

k2-1.23.4-cp39-cp39-macosx_10_15_x86_64.whl (2.4 MB view details)

Uploaded CPython 3.9 macOS 10.15+ x86-64

k2-1.23.4-cp38-cp38-macosx_10_15_x86_64.whl (2.4 MB view details)

Uploaded CPython 3.8 macOS 10.15+ x86-64

k2-1.23.4-cp37-cp37m-macosx_10_15_x86_64.whl (2.4 MB view details)

Uploaded CPython 3.7m macOS 10.15+ x86-64

File details

Details for the file k2-1.23.4-py3.11-none-any.whl.

File metadata

  • Download URL: k2-1.23.4-py3.11-none-any.whl
  • Upload date:
  • Size: 103.1 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.1

File hashes

Hashes for k2-1.23.4-py3.11-none-any.whl
Algorithm Hash digest
SHA256 5b5f709b12b9cd092b583984d0b7d940c9f898101259c998d68d1f4adefa48d4
MD5 f7a665b8bbc26d4da4f8eef11cb0d666
BLAKE2b-256 ecf1237d27510e0ef4d4c56b168d1b0168a0303e394827d7b041c24f0d539ef0

See more details on using hashes here.

Provenance

File details

Details for the file k2-1.23.4-py3.10-none-any.whl.

File metadata

  • Download URL: k2-1.23.4-py3.10-none-any.whl
  • Upload date:
  • Size: 103.1 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.9

File hashes

Hashes for k2-1.23.4-py3.10-none-any.whl
Algorithm Hash digest
SHA256 1404589732fdb7f6f5bf62297a20dd73318b7b4f2e10dd52dc3d5b6669688a4c
MD5 49a33278d8b155934c3edcca19d2bda3
BLAKE2b-256 eb5fdb4f6681451636791ae163448ddedc5e505a18b42af4dd76a09779c2fbd5

See more details on using hashes here.

Provenance

File details

Details for the file k2-1.23.4-py3.9-none-any.whl.

File metadata

  • Download URL: k2-1.23.4-py3.9-none-any.whl
  • Upload date:
  • Size: 103.1 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for k2-1.23.4-py3.9-none-any.whl
Algorithm Hash digest
SHA256 e483092fc06bbf211827474e17d71c3acee8a37ddbee32cbf2706bf179f2fa19
MD5 ea4cba422c1c45d2fb01c18d1117f178
BLAKE2b-256 73c083cab19113349aa782286454362d38d57b2ccc0735b27e3fd14ee774dffa

See more details on using hashes here.

Provenance

File details

Details for the file k2-1.23.4-py3.8-none-any.whl.

File metadata

  • Download URL: k2-1.23.4-py3.8-none-any.whl
  • Upload date:
  • Size: 103.1 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.16

File hashes

Hashes for k2-1.23.4-py3.8-none-any.whl
Algorithm Hash digest
SHA256 8992b1884cf7d3585e06031f017f3474854959bb7a33d4adbf5b91b538815584
MD5 70f5d09eb83dc33038938b11264ee9c1
BLAKE2b-256 efc530b90bfffb75682e95429c692b94204160e30bcfb30be56c57c4ff79e577

See more details on using hashes here.

Provenance

File details

Details for the file k2-1.23.4-py3.7-none-any.whl.

File metadata

  • Download URL: k2-1.23.4-py3.7-none-any.whl
  • Upload date:
  • Size: 103.1 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.7.15

File hashes

Hashes for k2-1.23.4-py3.7-none-any.whl
Algorithm Hash digest
SHA256 a29245e347344010ab0c2b325a5d5674177a72da68c482451200ba38b2fca2cf
MD5 69f6ff3f2a886958639f300166dcf6c4
BLAKE2b-256 e8a9673e90ef37806aefeff3afff804975b8e449ebc5abe9bee7d4a5e6b411a0

See more details on using hashes here.

Provenance

File details

Details for the file k2-1.23.4-cp310-cp310-macosx_10_15_x86_64.whl.

File metadata

File hashes

Hashes for k2-1.23.4-cp310-cp310-macosx_10_15_x86_64.whl
Algorithm Hash digest
SHA256 89d38aab1df9e5d10f59379a93f0c8b033ea10a7515083e05740afa3edf15e84
MD5 f2abe1bc23fe563e9b6c9fa14beee440
BLAKE2b-256 aa46a59d8e3031b5cc0b16f13c08b2eec64093101cc099ccd6d26620da499e92

See more details on using hashes here.

Provenance

File details

Details for the file k2-1.23.4-cp39-cp39-macosx_10_15_x86_64.whl.

File metadata

File hashes

Hashes for k2-1.23.4-cp39-cp39-macosx_10_15_x86_64.whl
Algorithm Hash digest
SHA256 93ce8119e572c61ac442184a22c347dcd45ab0a4391d2a118d76785a079d5b03
MD5 2c28062a09b44cebd0a593bd754f0e4a
BLAKE2b-256 4bafc57bca3208d50ac1e5db9a57b1ce387214da2c484cd7dd0da98b0de4ebcd

See more details on using hashes here.

Provenance

File details

Details for the file k2-1.23.4-cp38-cp38-macosx_10_15_x86_64.whl.

File metadata

File hashes

Hashes for k2-1.23.4-cp38-cp38-macosx_10_15_x86_64.whl
Algorithm Hash digest
SHA256 02e4b80885a2982f3e511b4ba133bdd9124af9ed4a01f70d147748fadffd4afd
MD5 9133874b6c04516163152638bb46df12
BLAKE2b-256 9c53023c26685bf076e4e2b9560d3d0973c84664982a760f7f3a953bc974e8b7

See more details on using hashes here.

Provenance

File details

Details for the file k2-1.23.4-cp37-cp37m-macosx_10_15_x86_64.whl.

File metadata

File hashes

Hashes for k2-1.23.4-cp37-cp37m-macosx_10_15_x86_64.whl
Algorithm Hash digest
SHA256 a89be075fe0c5027ad9bab95d897985da1057bdc7faa67e65c7eb6c86f7f908e
MD5 cacb8e2bcd10b84744f67a02a1ab3619
BLAKE2b-256 a5a19ad1e3e449d88bc02ada1411471f5f8c08b1635f1e45f404f7e2db01bc8c

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page