Skip to main content

PyTorch edit-distance functions

Project description

PyTorch edit-distance functions

Useful functions for E2E Speech Recognition training with PyTorch and CUDA.

Here is a simple use case with Reinforcement Learning and RNN-T loss:

blank = torch.tensor([0], dtype=torch.int).cuda()
space = torch.tensor([1], dtype=torch.int).cuda()

xs = model.greedy_decode(xs, sampled=True)

torch_edit_distance.remove_blank(xs, xn, blank)

rewards = 1 - torch_edit_distance.compute_wer(xs, ys, xn, yn, blank, space)

nll = rnnt_loss(zs, ys, xn, yn)

loss = nll * rewards

levenshtein_distance

Levenshtein edit-distance with detailed statistics for ins/del/sub operations.

collapse_repeated

Merge repeated tokens, useful for CTC-based model.

remove_blank

Remove unnecessary blank tokens, useful for CTC, RNN-T, RNA models.

strip_separator

Remove leading, trailing and repeated middle separators.

Requirements

  • C++11 compiler (tested with GCC 5.4).
  • Python: 3.5, 3.6, 3.7 (tested with version 3.6).
  • PyTorch >= 1.0.0 (tested with version 1.1.0).
  • CUDA Toolkit (tested with version 10.0).

Install

There is no compiled version of the package. The following setup instructions compile the package from the source code locally.

From Pypi

pip install torch_edit_distance

From GitHub

git clone https://github.com/1ytic/pytorch-edit-distance
cd pytorch-edit-distance
python setup.py install

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

torch_edit_distance-0.2.0.tar.gz (7.4 kB view details)

Uploaded Source

File details

Details for the file torch_edit_distance-0.2.0.tar.gz.

File metadata

  • Download URL: torch_edit_distance-0.2.0.tar.gz
  • Upload date:
  • Size: 7.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.4.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.6.9

File hashes

Hashes for torch_edit_distance-0.2.0.tar.gz
Algorithm Hash digest
SHA256 c0416ffadc21b63a551b468c930906df26a842caa98b25d8926c6ef513dc759f
MD5 6032320d4d0eb185057c891af5dd48cd
BLAKE2b-256 86b7475fd130b3798e3636bd7a7b14b02fa50ea04804592f805c284ac3b7c51e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page