Skip to main content

FSA/FST algorithms, intended to (eventually) be interoperable with PyTorch and similar

Project description

k2

The vision of k2 is to be able to seamlessly integrate Finite State Automaton (FSA) and Finite State Transducer (FST) algorithms into autograd-based machine learning toolkits like PyTorch and TensorFlow. For speech recognition applications, this should make it easy to interpolate and combine various training objectives such as cross-entropy, CTC and MMI and to jointly optimize a speech recognition system with multiple decoding passes including lattice rescoring and confidence estimation. We hope k2 will have many other applications as well.

One of the key algorithms that we have implemented is pruned composition of a generic FSA with a "dense" FSA (i.e. one that corresponds to log-probs of symbols at the output of a neural network). This can be used as a fast implementation of decoding for ASR, and for CTC and LF-MMI training. This won't give a direct advantage in terms of Word Error Rate when compared with existing technology; but the point is to do this in a much more general and extensible framework to allow further development of ASR technology.

Implementation

A few key points on our implementation strategy.

Most of the code is in C++ and CUDA. We implement a templated class Ragged, which is quite like TensorFlow's RaggedTensor (actually we came up with the design independently, and were later told that TensorFlow was using the same ideas). Despite a close similarity at the level of data structures, the design is quite different from TensorFlow and PyTorch. Most of the time we don't use composition of simple operations, but rely on C++11 lambdas defined directly in the C++ implementations of algorithms. The code in these lambdas operate directly on data pointers and, if the backend is CUDA, they can run in parallel for each element of a tensor. (The C++ and CUDA code is mixed together and the CUDA kernels get instantiated via templates).

It is difficult to adequately describe what we are doing with these Ragged objects without going in detail through the code. The algorithms look very different from the way you would code them on CPU because of the need to avoid sequential processing. We are using coding patterns that make the most expensive parts of the computations "embarrassingly parallelizable"; the only somewhat nontrivial CUDA operations are generally reduction-type operations such as exclusive-prefix-sum, for which we use NVidia's cub library. Our design is not too specific to the NVidia hardware and the bulk of the code we write is fairly normal-looking C++; the nontrivial CUDA programming is mostly done via the cub library, parts of which we wrap with our own convenient interface.

The Finite State Automaton object is then implemented as a Ragged tensor templated on a specific data type (a struct representing an arc in the automaton).

Autograd

If you look at the code as it exists now, you won't find any references to autograd. The design is quite different to TensorFlow and PyTorch (which is why we didn't simply extend one of those toolkits). Instead of making autograd come from the bottom up (by making individual operations differentiable) we are implementing it from the top down, which is much more efficient in this case (and will tend to have better roundoff properties).

An example: suppose we are finding the best path of an FSA, and we need derivatives. We implement this by keeping track of, for each arc in the output best-path, which input arc it corresponds to. (For more complex algorithms an arc in the output might correspond to a sum of probabilities of a list of input arcs). We can make this compatible with PyTorch/TensorFlow autograd at the Python level, by, for example, defining a Function class in PyTorch that remembers this relationship between the arcs and does the appropriate (sparse) operations to propagate back the derivatives w.r.t. the weights.

Current state of the code

We have wrapped all the C++ code to Python with pybind11 and have finished the integration with PyTorch.

We are currently writing speech recognition recipes using k2, which are hosted in a separate repository. Please see https://github.com/k2-fsa/icefall.

Plans after initial release

We are currently trying to make k2 ready for production use (see the branch v2.0-pre).

Quick start

Want to try it out without installing anything? We have setup a Google Colab. You can find more Colab notebooks using k2 in speech recognition at https://icefall.readthedocs.io/en/latest/recipes/librispeech/conformer_ctc.html.

Project details


Release history Release notifications | RSS feed

This version

1.11

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

k2-1.11-py38-none-any.whl (59.0 MB view details)

Uploaded Python 3.8

k2-1.11-py37-none-any.whl (59.0 MB view details)

Uploaded Python 3.7

k2-1.11-py36-none-any.whl (59.0 MB view details)

Uploaded Python 3.6

k2-1.11-cp38-cp38-macosx_10_15_x86_64.whl (1.7 MB view details)

Uploaded CPython 3.8 macOS 10.15+ x86-64

k2-1.11-cp37-cp37m-macosx_10_15_x86_64.whl (1.7 MB view details)

Uploaded CPython 3.7m macOS 10.15+ x86-64

k2-1.11-cp36-cp36m-macosx_10_15_x86_64.whl (1.7 MB view details)

Uploaded CPython 3.6m macOS 10.15+ x86-64

File details

Details for the file k2-1.11-py38-none-any.whl.

File metadata

  • Download URL: k2-1.11-py38-none-any.whl
  • Upload date:
  • Size: 59.0 MB
  • Tags: Python 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for k2-1.11-py38-none-any.whl
Algorithm Hash digest
SHA256 8024d803705a9e8017569083ce4b608dadb487e8a05ec1623dbd4215250680ab
MD5 92b62587a6e146a112b81199322e2383
BLAKE2b-256 7e13ea6c649708d6f07c097f590823f38288406da6d3af10e7ee0283e5fba731

See more details on using hashes here.

Provenance

File details

Details for the file k2-1.11-py37-none-any.whl.

File metadata

  • Download URL: k2-1.11-py37-none-any.whl
  • Upload date:
  • Size: 59.0 MB
  • Tags: Python 3.7
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.12

File hashes

Hashes for k2-1.11-py37-none-any.whl
Algorithm Hash digest
SHA256 d5c80231656fd456d5c918bbc468935ef1e8317e8b705e80b5bd8f00bb68080b
MD5 b3750106a255f5e2f9492487d1928b9a
BLAKE2b-256 294dd2a6488ad265df6eb16041f367d09668704e7562800301df63838f1739c2

See more details on using hashes here.

Provenance

File details

Details for the file k2-1.11-py36-none-any.whl.

File metadata

  • Download URL: k2-1.11-py36-none-any.whl
  • Upload date:
  • Size: 59.0 MB
  • Tags: Python 3.6
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.6.15

File hashes

Hashes for k2-1.11-py36-none-any.whl
Algorithm Hash digest
SHA256 ccb63263a8fdda711fbf1652573edec39831e298a568ea5ef7b3135984fd4cae
MD5 0f080a8501e51757a96eabc8255b650e
BLAKE2b-256 1dc9e6881c8fdcafe83ed2289266c7f6f63e66285fc8b1c96570486e89701cd6

See more details on using hashes here.

Provenance

File details

Details for the file k2-1.11-cp38-cp38-macosx_10_15_x86_64.whl.

File metadata

  • Download URL: k2-1.11-cp38-cp38-macosx_10_15_x86_64.whl
  • Upload date:
  • Size: 1.7 MB
  • Tags: CPython 3.8, macOS 10.15+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for k2-1.11-cp38-cp38-macosx_10_15_x86_64.whl
Algorithm Hash digest
SHA256 66478f3944e845c07d8ebc28fee1aa2bda0d85634cb6cecef7f59ce7e3f8a7c5
MD5 af8a082f49d7624d265ce71d97a807e2
BLAKE2b-256 314cf16134a8e0c4d6803fa1ddded0c83e2ebc33df49e05a995f4ee94bd7d1dd

See more details on using hashes here.

Provenance

File details

Details for the file k2-1.11-cp37-cp37m-macosx_10_15_x86_64.whl.

File metadata

  • Download URL: k2-1.11-cp37-cp37m-macosx_10_15_x86_64.whl
  • Upload date:
  • Size: 1.7 MB
  • Tags: CPython 3.7m, macOS 10.15+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.12

File hashes

Hashes for k2-1.11-cp37-cp37m-macosx_10_15_x86_64.whl
Algorithm Hash digest
SHA256 fcef1ef15bff42dc72d354f62737153eba58ce30fa9011919362bc9150884a5e
MD5 128f23e9aab97673edfab9a0dbe340de
BLAKE2b-256 f37b0ae097c39c7afd371ee7540ffcb4db32907d70fd9ae61d83bc903c055d32

See more details on using hashes here.

Provenance

File details

Details for the file k2-1.11-cp36-cp36m-macosx_10_15_x86_64.whl.

File metadata

  • Download URL: k2-1.11-cp36-cp36m-macosx_10_15_x86_64.whl
  • Upload date:
  • Size: 1.7 MB
  • Tags: CPython 3.6m, macOS 10.15+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.6.15

File hashes

Hashes for k2-1.11-cp36-cp36m-macosx_10_15_x86_64.whl
Algorithm Hash digest
SHA256 2daf78a4909f6b8c9e3f44d76df3afe50f981aa4db4fc82b5b5978c5f8e714b2
MD5 35b9cf46ef57a8811db29018ac2296a1
BLAKE2b-256 d33ff8226c812b5ba7d39b877937852614567299121b06dd86585abd3e30805b

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page