Skip to main content

Kaldi alignment methods wrapped into Python

Project description

kaldialign

A small package that exposes edit distance computation functions from Kaldi. It uses the original Kaldi code and wraps it using Cython.

Examples

  • align(seq1, seq2, epsilon) - used to obtain the alignment between two string sequences. epsilon should be a null symbol (indicating deletion/insertion) that doesn't exist in either sequence.
from kaldialign import align

EPS = '*'
a = ['a', 'b', 'c']
b = ['a', 's', 'x', 'c']
ali = align(a, b, EPS)
assert ali == [('a', 'a'), (EPS, 's'), ('b', 'x'), ('c', 'c')]
  • edit_distance(seq1, seq2) - used to obtain the total edit distance, as well as the number of insertions, deletions and substitutions.
from kaldialign import edit_distance

a = ['a', 'b', 'c']
b = ['a', 's', 'x', 'c']
results = edit_distance(a, b)
assert results == {
    'ins': 1,
    'del': 0,
    'sub': 1,
    'total': 2
}

Motivation

The need for this arised from the fact that practically all implementations of the Levenshtein distance have slight differences, making it impossible to use a different scoring tool than Kaldi and get the same error rate results. This package copies code from Kaldi directly and wraps it using Cython, avoiding the issue altogether.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kaldialign-0.1.7.tar.gz (39.8 kB view details)

Uploaded Source

Built Distribution

kaldialign-0.1.7-cp36-cp36m-macosx_10_9_x86_64.whl (26.6 kB view details)

Uploaded CPython 3.6m macOS 10.9+ x86-64

File details

Details for the file kaldialign-0.1.7.tar.gz.

File metadata

  • Download URL: kaldialign-0.1.7.tar.gz
  • Upload date:
  • Size: 39.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0.post20201006 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.6.12

File hashes

Hashes for kaldialign-0.1.7.tar.gz
Algorithm Hash digest
SHA256 f1c58560de9f91a17cbb5b91ef6518e62e90273824c530ace8987ab5dba72f21
MD5 f8732748d0d7afd3a2ada5a5d1fb2e4c
BLAKE2b-256 b12f343c5b55c732f21d625f5c40c51ffcfdb1d3cad9dccedb9e73518bb221c0

See more details on using hashes here.

File details

Details for the file kaldialign-0.1.7-cp36-cp36m-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: kaldialign-0.1.7-cp36-cp36m-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 26.6 kB
  • Tags: CPython 3.6m, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0.post20201006 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.6.12

File hashes

Hashes for kaldialign-0.1.7-cp36-cp36m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 bed46dfe31fccca487aac98fa8b04a6db20e6f43c74452a77334a9b3c4d234c0
MD5 a2c9f7cba4a13a3a328fc00eca0ecf08
BLAKE2b-256 a56001e3ae975ab17b29e51e26a557c23a9c9923a0629d27177b7eaacf8a41e1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page