Skip to main content

Kaldi alignment methods wrapped into Python

Project description

kaldialign

A small package that exposes edit distance computation functions from Kaldi. It uses the original Kaldi code and wraps it using Cython.

Examples

  • align(seq1, seq2, epsilon) - used to obtain the alignment between two string sequences. epsilon should be a null symbol (indicating deletion/insertion) that doesn't exist in either sequence.
from kaldialign import align

EPS = '*'
a = ['a', 'b', 'c']
b = ['a', 's', 'x', 'c']
ali = align(a, b, EPS)
assert ali == [('a', 'a'), (EPS, 's'), ('b', 'x'), ('c', 'c')]
  • edit_distance(seq1, seq2) - used to obtain the total edit distance, as well as the number of insertions, deletions and substitutions.
from kaldialign import edit_distance

a = ['a', 'b', 'c']
b = ['a', 's', 'x', 'c']
results = edit_distance(a, b)
assert results == {
    'ins': 1,
    'del': 0,
    'sub': 1,
    'total': 2
}

Motivation

The need for this arised from the fact that practically all implementations of the Levenshtein distance have slight differences, making it impossible to use a different scoring tool than Kaldi and get the same error rate results. This package copies code from Kaldi directly and wraps it using Cython, avoiding the issue altogether.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kaldialign-0.2.tar.gz (39.8 kB view details)

Uploaded Source

Built Distribution

kaldialign-0.2-cp37-cp37m-macosx_10_9_x86_64.whl (26.4 kB view details)

Uploaded CPython 3.7m macOS 10.9+ x86-64

File details

Details for the file kaldialign-0.2.tar.gz.

File metadata

  • Download URL: kaldialign-0.2.tar.gz
  • Upload date:
  • Size: 39.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.25.1 setuptools/51.1.2.post20210110 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.7

File hashes

Hashes for kaldialign-0.2.tar.gz
Algorithm Hash digest
SHA256 4550ddc29e8f7e81b0174d85cf9cb9e65b2242c89955b4eadd76d3856f9ce230
MD5 2d6bca3c67609e0df061b1f6de9b1b46
BLAKE2b-256 301eec6129e4c26dc5be08aef1ebc5dd4aba3465e77877aa01533f65ca7aac43

See more details on using hashes here.

File details

Details for the file kaldialign-0.2-cp37-cp37m-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: kaldialign-0.2-cp37-cp37m-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 26.4 kB
  • Tags: CPython 3.7m, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.25.1 setuptools/51.1.2.post20210110 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.7

File hashes

Hashes for kaldialign-0.2-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 44aa754da139f4ced79bd6e492dba36b3fabcb7697d86489679860037322d4eb
MD5 049a7a5065807c6ffb76204781ab5ffa
BLAKE2b-256 956ca83c432522f04c422016758737be4f1f7cffdcf7809165172073e72b78ba

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page