Skip to main content

Kaldi alignment methods wrapped into Python

Project description

kaldialign

A small package that exposes edit distance computation functions from Kaldi. It uses the original Kaldi code and wraps it using pybind11.

Installation

pip install --verbose kaldialign

or

pip install --verbose -U git+https://github.com/pzelasko/kaldialign.git

or

git clone https://github.com/pzelasko/kaldialign.git
cd kaldialign
python3 setup.py install --verbose

Examples

  • align(seq1, seq2, epsilon) - used to obtain the alignment between two string sequences. epsilon should be a null symbol (indicating deletion/insertion) that doesn't exist in either sequence.
from kaldialign import align

EPS = '*'
a = ['a', 'b', 'c']
b = ['a', 's', 'x', 'c']
ali = align(a, b, EPS)
assert ali == [('a', 'a'), (b, 's'), (EPS, 'x'), ('c', 'c')]
  • edit_distance(seq1, seq2) - used to obtain the total edit distance, as well as the number of insertions, deletions and substitutions.
from kaldialign import edit_distance

a = ['a', 'b', 'c']
b = ['a', 's', 'x', 'c']
results = edit_distance(a, b)
assert results == {
    'ins': 1,
    'del': 0,
    'sub': 1,
    'total': 2
}

Motivation

The need for this arised from the fact that practically all implementations of the Levenshtein distance have slight differences, making it impossible to use a different scoring tool than Kaldi and get the same error rate results. This package copies code from Kaldi directly and wraps it using Cython, avoiding the issue altogether.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kaldialign-0.3.tar.gz (10.0 kB view details)

Uploaded Source

Built Distribution

kaldialign-0.3-cp310-cp310-macosx_12_0_arm64.whl (42.9 kB view details)

Uploaded CPython 3.10 macOS 12.0+ ARM64

File details

Details for the file kaldialign-0.3.tar.gz.

File metadata

  • Download URL: kaldialign-0.3.tar.gz
  • Upload date:
  • Size: 10.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.4

File hashes

Hashes for kaldialign-0.3.tar.gz
Algorithm Hash digest
SHA256 071d16089f206d6c11025858fff78bb1991b2eaea718ff13a0d9d9e69a7107ef
MD5 9a3f7dbdb5a27565faaf7419b5a017c6
BLAKE2b-256 064dcddae8f15576630c55aeadd58631d400eb366358581dd7b57e0896cba6f0

See more details on using hashes here.

File details

Details for the file kaldialign-0.3-cp310-cp310-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for kaldialign-0.3-cp310-cp310-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 fa6c379d9cd4507e7771b2d804ae2f1cb151fe6646b0ba9c690d8b84cfc4c76b
MD5 2a2197ba85b5cd60b4975d39971bedf7
BLAKE2b-256 dc5e016b9fb8d72a663020eeb04515bdef63a097affd55c900c2643bd0658dee

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page