Skip to main content

FSA/FST algorithms, intended to (eventually) be interoperable with PyTorch and similar

Project description

Documentation Status

k2

The vision of k2 is to be able to seamlessly integrate Finite State Automaton (FSA) and Finite State Transducer (FST) algorithms into autograd-based machine learning toolkits like PyTorch and TensorFlow. For speech recognition applications, this should make it easy to interpolate and combine various training objectives such as cross-entropy, CTC and MMI and to jointly optimize a speech recognition system with multiple decoding passes including lattice rescoring and confidence estimation. We hope k2 will have many other applications as well.

One of the key algorithms that we want to make efficient in the short term is pruned composition of a generic FSA with a "dense" FSA (i.e. one that corresponds to log-probs of symbols at the output of a neural network). This can be used as a fast implementation of decoding for ASR, and for CTC and LF-MMI training. This won't give a direct advantage in terms of Word Error Rate when compared with existing technology; but the point is to do this in a much more general and extensible framework to allow further development of ASR technology.

Implementation

A few key points on our implementation strategy.

Most of the code is in C++ and CUDA. We implement a templated class Ragged, which is quite like TensorFlow's RaggedTensor (actually we came up with the design independently, and were later told that TensorFlow was using the same ideas). Despite a close similarity at the level of data structures, the design is quite different from TensorFlow and PyTorch. Most of the time we don't use composition of simple operations, but rely on C++11 lambdas defined directly in the C++ implementations of algorithms. The code in these lambdas operate directly on data pointers and, if the backend is CUDA, they can run in parallel for each element of a tensor. (The C++ and CUDA code is mixed together and the CUDA kernels get instantiated via templates).

It is difficult to adequately describe what we are doing with these Ragged objects without going in detail through the code. The algorithms look very different from the way you would code them on CPU because of the need to avoid sequential processing. We are using coding patterns that make the most expensive parts of the computations "embarrassingly parallelizable"; the only somewhat nontrivial CUDA operations are generally reduction-type operations such as exclusive-prefix-sum, for which we use NVidia's cub library. Our design is not too specific to the NVidia hardware and the bulk of the code we write is fairly normal-looking C++; the nontrivial CUDA programming is mostly done via the cub library, parts of which we wrap with our own convenient interface.

The Finite State Automaton object is then implemented as a Ragged tensor templated on a specific data type (a struct representing an arc in the automaton).

Autograd

If you look at the code as it exists now, you won't find any references to autograd. The design is quite different to TensorFlow and PyTorch (which is why we didn't simply extend one of those toolkits). Instead of making autograd come from the bottom up (by making individual operations differentiable) we are implementing it from the top down, which is much more efficient in this case (and will tend to have better roundoff properties).

An example: suppose we are finding the best path of an FSA, and we need derivatives. We implement this by keeping track of, for each arc in the output best-path, which input arc it corresponds to. (For more complex algorithms an arc in the output might correspond to a sum of probabilities of a list of input arcs). We can make this compatible with PyTorch/TensorFlow autograd at the Python level, by, for example, defining a Function class in PyTorch that remembers this relationship between the arcs and does the appropriate (sparse) operations to propagate back the derivatives w.r.t. the weights.

Current state of the code

A lot of the code is still unfinished (Sep 11, 2020). We finished the CPU versions of many algorithms and this code is in k2/csrc/host/; however, after that we figured out how to implement things on the GPU and decided to change the interfaces so the CPU and GPU code had a more unified interface. Currently in k2/csrc/ we have more GPU-oriented implementations (although these algorithms will also work on CPU). We had almost finished the Python wrapping for the older code, in the k2/python/ subdirectory, but we decided not to release code with that wrapping because it would have had to be reworked to be compatible with our GPU algorithms. Instead we will use the interfaces drafted in k2/csrc/ e.g. the Context object (which encapsulates things like memory managers from external toolkits) and the Tensor object which can be used to wrap tensors from external toolkits; and wrap those in Python (using pybind11). The code in host/ will eventually be either deprecated, rewritten or wrapped with newer-style interfaces.

Plans for initial release

We hope to get the first version working in early October. The current short-term aim is to finish the GPU implementation of pruned composition of a normal FSA with a dense FSA, which is the same as decoder search in speech recognition and can be used to implement CTC training and lattice-free MMI (LF-MMI) training. The proof-of-concept that we will release initially is something that's like CTC but allowing more general supervisions (general FSAs rather than linear sequences). This will work on GPU. The same underlying code will support LF-MMI so that would be easy to implement soon after. We plan to put example code in a separate repository.

Plans after initial release

We will then gradually implement more algorithms in a way that's compatible with the interfaces in k2/csrc/. Some of them will be CPU-only to start with. The idea is to eventually have very rich capabilities for operating on collections of sequences, including methods to convert from a lattice to a collection of linear sequences and back again (for purposes of neural language model rescoring, neural confidence estimation and the like).

Quick start

Want to try it out without installing anything? We have setup a Google Colab.

Caution: k2 is not nearly ready for actual use! We are still coding the core algorithms, and hope to have an early version working by early October.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

k2-0.3.5.dev20210608-py38-none-any.whl (55.4 MB view details)

Uploaded Python 3.8

k2-0.3.5.dev20210608-py37-none-any.whl (55.4 MB view details)

Uploaded Python 3.7

k2-0.3.5.dev20210608-py36-none-any.whl (55.4 MB view details)

Uploaded Python 3.6

k2-0.3.5.dev20210608-cp38-cp38-macosx_10_15_x86_64.whl (1.4 MB view details)

Uploaded CPython 3.8 macOS 10.15+ x86-64

k2-0.3.5.dev20210608-cp37-cp37m-macosx_10_15_x86_64.whl (1.4 MB view details)

Uploaded CPython 3.7m macOS 10.15+ x86-64

k2-0.3.5.dev20210608-cp36-cp36m-macosx_10_15_x86_64.whl (1.4 MB view details)

Uploaded CPython 3.6m macOS 10.15+ x86-64

File details

Details for the file k2-0.3.5.dev20210608-py38-none-any.whl.

File metadata

  • Download URL: k2-0.3.5.dev20210608-py38-none-any.whl
  • Upload date:
  • Size: 55.4 MB
  • Tags: Python 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.8.10

File hashes

Hashes for k2-0.3.5.dev20210608-py38-none-any.whl
Algorithm Hash digest
SHA256 22c000543b863f2d0fa5093e628c52d9baa3599ff58296a4b4ec41f9d45d0d24
MD5 b85508e1c9d33433e7358595b8b3e51b
BLAKE2b-256 95dcc7b6000d58e65b335d342954512bcc8d495845ae019c8a9165d35443e4e6

See more details on using hashes here.

File details

Details for the file k2-0.3.5.dev20210608-py37-none-any.whl.

File metadata

  • Download URL: k2-0.3.5.dev20210608-py37-none-any.whl
  • Upload date:
  • Size: 55.4 MB
  • Tags: Python 3.7
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.7.10

File hashes

Hashes for k2-0.3.5.dev20210608-py37-none-any.whl
Algorithm Hash digest
SHA256 c2bb583f2678fe8f769f6cc3578736d3341a9ccb95598f73e306482e19eddf06
MD5 af058fc4f7fc724739dcfadc2ae7d9a5
BLAKE2b-256 123e98a06b8e13eb7af2515c145edc64ca81d4e73b5bbbad397745fd6b6544b8

See more details on using hashes here.

File details

Details for the file k2-0.3.5.dev20210608-py36-none-any.whl.

File metadata

  • Download URL: k2-0.3.5.dev20210608-py36-none-any.whl
  • Upload date:
  • Size: 55.4 MB
  • Tags: Python 3.6
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.6.13

File hashes

Hashes for k2-0.3.5.dev20210608-py36-none-any.whl
Algorithm Hash digest
SHA256 cab06e00fe460be2de9be11cea27c449238280674f7f7df6d36cf958ad63c2ec
MD5 e13b96d79030448a83106824b4c8d440
BLAKE2b-256 9c39661590258fd41f4366c7b9f55e223a37332edafcfddc3c7247d60d8441da

See more details on using hashes here.

File details

Details for the file k2-0.3.5.dev20210608-cp38-cp38-macosx_10_15_x86_64.whl.

File metadata

  • Download URL: k2-0.3.5.dev20210608-cp38-cp38-macosx_10_15_x86_64.whl
  • Upload date:
  • Size: 1.4 MB
  • Tags: CPython 3.8, macOS 10.15+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.8.6

File hashes

Hashes for k2-0.3.5.dev20210608-cp38-cp38-macosx_10_15_x86_64.whl
Algorithm Hash digest
SHA256 dbb36d7d792e31e56db2b776ae5ff4a35a152796efb9b33b6debe7af7142387f
MD5 f198739bd21dbe0ffdc279f001285d35
BLAKE2b-256 050678a924f005da6688b4c42b0ea31f7868e6d932ac1f4eac9d6fee226ed171

See more details on using hashes here.

File details

Details for the file k2-0.3.5.dev20210608-cp37-cp37m-macosx_10_15_x86_64.whl.

File metadata

  • Download URL: k2-0.3.5.dev20210608-cp37-cp37m-macosx_10_15_x86_64.whl
  • Upload date:
  • Size: 1.4 MB
  • Tags: CPython 3.7m, macOS 10.15+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.7.10

File hashes

Hashes for k2-0.3.5.dev20210608-cp37-cp37m-macosx_10_15_x86_64.whl
Algorithm Hash digest
SHA256 d98a0b3811e79a60f9524387d0d709cf061e12a3672e12fe5f2f7eb8a540e082
MD5 454c12968869b6589d510c072f5ced80
BLAKE2b-256 ab3b65de1b8351d41649440d7a393da30cb498aa80c0307940a6d190fd962a4e

See more details on using hashes here.

File details

Details for the file k2-0.3.5.dev20210608-cp36-cp36m-macosx_10_15_x86_64.whl.

File metadata

  • Download URL: k2-0.3.5.dev20210608-cp36-cp36m-macosx_10_15_x86_64.whl
  • Upload date:
  • Size: 1.4 MB
  • Tags: CPython 3.6m, macOS 10.15+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.6.13

File hashes

Hashes for k2-0.3.5.dev20210608-cp36-cp36m-macosx_10_15_x86_64.whl
Algorithm Hash digest
SHA256 b9c489e68e59bee0c20ae70bcd30d0aa21eba22a05dad036f79b8e127ad23568
MD5 b256e9d6248fef88e5783a3440474b4e
BLAKE2b-256 95b767c26af10b70d176c655192f0be1648923c4ee88829856d27274d1b6dc9e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page