Skip to main content

Kaldi alignment methods wrapped into Python

Project description

kaldialign

A small package that exposes edit distance computation functions from Kaldi. It uses the original Kaldi code and wraps it using pybind11.

Installation

conda install -c kaldialign kaldialign

or

pip install --verbose kaldialign

or

pip install --verbose -U git+https://github.com/pzelasko/kaldialign.git

or

git clone https://github.com/pzelasko/kaldialign.git
cd kaldialign
python3 -m pip install --verbose .

Examples

Alignment

align(ref, hyp, epsilon) - used to obtain the alignment between two string sequences. epsilon should be a null symbol (indicating deletion/insertion) that doesn't exist in either sequence.

from kaldialign import align

EPS = '*'
a = ['a', 'b', 'c']
b = ['a', 's', 'x', 'c']
ali = align(a, b, EPS)
assert ali == [('a', 'a'), ('b', 's'), (EPS, 'x'), ('c', 'c')]

Edit distance

edit_distance(ref, hyp) - used to obtain the total edit distance, as well as the number of insertions, deletions and substitutions.

from kaldialign import edit_distance

a = ['a', 'b', 'c']
b = ['a', 's', 'x', 'c']
results = edit_distance(a, b)
assert results == {
    'ins': 1,
    'del': 0,
    'sub': 1,
    'total': 2
}

For alignment and edit distance, you can pass sclite_mode=True to compute WER or alignments based on SCLITE style weights, i.e., insertion/deletion cost 3 and substitution cost 4.

Bootstrapping method to extract WER 95% confidence intervals

boostrap_wer_ci(ref, hyp, hyp2=None) - obtain the 95% confidence intervals for WER using Bisani and Ney boostrapping method.

from kaldialign import bootstrap_wer_ci

ref = [
    ("a", "b", "c"),
    ("d", "e", "f"),
]
hyp = [
    ("a", "b", "d"),
    ("e", "f", "f"),
]
ans = bootstrap_wer_ci(ref, hyp)
assert ans["wer"] == 0.4989
assert ans["ci95"] == 0.2312
assert ans["ci95min"] == 0.2678
assert ans["ci95max"] == 0.7301

It also supports providing hypotheses from system 1 and system 2 to compute the probability of S2 improving over S1:

from kaldialign import bootstrap_wer_ci

ref = [
    ("a", "b", "c"),
    ("d", "e", "f"),
]
hyp = [
    ("a", "b", "d"),
    ("e", "f", "f"),
]
hyp2 = [
    ("a", "b", "c"),
    ("e", "e", "f"),
]
ans = bootstrap_wer_ci(ref, hyp, hyp2)

s = ans["system1"]
assert s["wer"] == 0.4989
assert s["ci95"] == 0.2312
assert s["ci95min"] == 0.2678
assert s["ci95max"] == 0.7301

s = ans["system2"]
assert s["wer"] == 0.1656
assert s["ci95"] == 0.2312
assert s["ci95min"] == -0.0656
assert s["ci95max"] == 0.3968

assert ans["p_s2_improv_over_s1"] == 1.0

Motivation

The need for this arised from the fact that practically all implementations of the Levenshtein distance have slight differences, making it impossible to use a different scoring tool than Kaldi and get the same error rate results. This package copies code from Kaldi directly and wraps it using pybind11, avoiding the issue altogether.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

kaldialign-0.9.2-cp313-cp313-win_amd64.whl (74.7 kB view details)

Uploaded CPython 3.13Windows x86-64

kaldialign-0.9.2-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (92.2 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

kaldialign-0.9.2-cp312-cp312-win_amd64.whl (74.8 kB view details)

Uploaded CPython 3.12Windows x86-64

kaldialign-0.9.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (91.9 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

kaldialign-0.9.2-cp311-cp311-win_amd64.whl (74.1 kB view details)

Uploaded CPython 3.11Windows x86-64

kaldialign-0.9.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (91.7 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

kaldialign-0.9.2-cp310-cp310-win_amd64.whl (73.7 kB view details)

Uploaded CPython 3.10Windows x86-64

kaldialign-0.9.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (91.7 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

kaldialign-0.9.2-cp39-cp39-win_amd64.whl (74.1 kB view details)

Uploaded CPython 3.9Windows x86-64

kaldialign-0.9.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (91.9 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

kaldialign-0.9.2-cp38-cp38-win_amd64.whl (73.8 kB view details)

Uploaded CPython 3.8Windows x86-64

kaldialign-0.9.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (91.6 kB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

kaldialign-0.9.2-cp37-cp37m-win_amd64.whl (74.5 kB view details)

Uploaded CPython 3.7mWindows x86-64

kaldialign-0.9.2-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (93.5 kB view details)

Uploaded CPython 3.7mmanylinux: glibc 2.17+ x86-64

File details

Details for the file kaldialign-0.9.2-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: kaldialign-0.9.2-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 74.7 kB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.13

File hashes

Hashes for kaldialign-0.9.2-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 8cfc96622b90c2eda56a40b73b2274c5e94f1df12481f7cb5b46cb599d3a0be9
MD5 9b8064b615b6be74e626e24bd88929da
BLAKE2b-256 2d3b829cd76bf70d19215004d5dc50bcaab0bea84f5f9a0cce4e60ad52bb32f8

See more details on using hashes here.

File details

Details for the file kaldialign-0.9.2-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for kaldialign-0.9.2-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 6fb471fee0e868e4edd768e3bcd181837f2a27f0d8495ee931e5d516f12dc43e
MD5 70b81e980ac06faf22ef8438be4a1fe0
BLAKE2b-256 cb30d4b9c970675fa4b11cdc164eb78fcb3b11fa10c1961a2a018683083df4d4

See more details on using hashes here.

File details

Details for the file kaldialign-0.9.2-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: kaldialign-0.9.2-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 74.8 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.13

File hashes

Hashes for kaldialign-0.9.2-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 fea66f7f35a1d96d430f717a79b2919e341b6371a9491e52304db3b370e3097d
MD5 c4671a15ced8ed7b84e37b71c2b4bd7f
BLAKE2b-256 0106563d7d08bf5473048ac3cd7ab3b077ac45d133733bfd5bb84f0e4e7cc933

See more details on using hashes here.

File details

Details for the file kaldialign-0.9.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for kaldialign-0.9.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 bef805669578bea2d766fcddb602695d5fbb657a43519aa1f94836750c27d9b7
MD5 aa1d16d137ac7f5cec7583fa4d400680
BLAKE2b-256 53dc13b43c5eb7dbf1e63147dc201691abfaf230fe7b105da2b993594d81e449

See more details on using hashes here.

File details

Details for the file kaldialign-0.9.2-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: kaldialign-0.9.2-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 74.1 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.13

File hashes

Hashes for kaldialign-0.9.2-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 f74b9086c3f5efdf54c149d28495e2648015c501669245c38b0e02aa9309f08e
MD5 7b516c35062e64f5158bb80cf925cd5c
BLAKE2b-256 f57a0e10eba54094545dd99c2ac32db7ff2f928a568ce3ecb5645b18ba87dfdf

See more details on using hashes here.

File details

Details for the file kaldialign-0.9.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for kaldialign-0.9.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 7954665b95e326b19bd7c737fc0f8109fdda9970f5f692e833468563f0363192
MD5 86fafdf6d127f22f877c00acc66ba634
BLAKE2b-256 e38f862bf7b84c402d86ef0fdd75ffa4427376ba43b14d0833d704605cca302a

See more details on using hashes here.

File details

Details for the file kaldialign-0.9.2-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: kaldialign-0.9.2-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 73.7 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.13

File hashes

Hashes for kaldialign-0.9.2-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 16883b9c7b9ceda9adf4bb1fcba3f466d0a8d65134a0ee0af221fb15353d015c
MD5 327696e9a4290856292b91f55837347c
BLAKE2b-256 d3eb41918ed3365b75cbbd6e9bc65c3eb88706b365f36a7b6e9bab28093c03dc

See more details on using hashes here.

File details

Details for the file kaldialign-0.9.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for kaldialign-0.9.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 4922abd2bda24d518d0d91031ada06714bb667138451d62d604654bbcb4f5aee
MD5 17c1dd54c77f06263c2c57302090c0ca
BLAKE2b-256 8c582162153ef8b7a9e9136a3b1fa7f0ebd91551bc6963c5ba2a4e9d309bcf0e

See more details on using hashes here.

File details

Details for the file kaldialign-0.9.2-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: kaldialign-0.9.2-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 74.1 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.13

File hashes

Hashes for kaldialign-0.9.2-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 fc9e47c4023eb11ca9cd8ddaf82f2769ab54aa9759bea4b7b5b0474dff64f99d
MD5 2232c5e59f8a357251d6aa89ba18f140
BLAKE2b-256 47069843b36e20d5eb894521297b40bd4fc80b4f12e0909aecae2a917a60ee1f

See more details on using hashes here.

File details

Details for the file kaldialign-0.9.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for kaldialign-0.9.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 d265de1aea084533d88bee8843860440a22a0e6a2940730dae588a795e0cceda
MD5 c06d0463701391b8a650602fae57d57e
BLAKE2b-256 37c1e3019495e5270f29f8cf93ae8d21da8f9cc999e8562e38b8b3bc5a692a7b

See more details on using hashes here.

File details

Details for the file kaldialign-0.9.2-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: kaldialign-0.9.2-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 73.8 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.13

File hashes

Hashes for kaldialign-0.9.2-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 ce73acd0ab6c372dd947cc9ebe2d88afb556ce2d8f6f15f5258ade656ace51d2
MD5 4d8b33f1cf7d18454ff4c3d8baa85a8e
BLAKE2b-256 e07edeec2ff525fe918ddd1035366f606da30d884e9bf2d0ca282829be2113fb

See more details on using hashes here.

File details

Details for the file kaldialign-0.9.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for kaldialign-0.9.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 62f9d8e8ebca76ffc92352d37d19831984c78e086ee9962bc3e80f0afc75a802
MD5 c4db53f321b46c3c64d49d6dfd8fe55d
BLAKE2b-256 5604423965706ddfb3de74e2c28df074953f1760d91b5813532af80c0b0f6ea2

See more details on using hashes here.

File details

Details for the file kaldialign-0.9.2-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: kaldialign-0.9.2-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 74.5 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.13

File hashes

Hashes for kaldialign-0.9.2-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 d1e96b6ba907cee2ab7a745a50b88156ae0949fdd6455a2f4a3a96ecdb4b3ba3
MD5 0740ba5cc6d129063609fc5091eb424e
BLAKE2b-256 7d191118199dd10f4fdfabbe51528d31fcafb89f6e9ce19033ab89979e998a09

See more details on using hashes here.

File details

Details for the file kaldialign-0.9.2-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for kaldialign-0.9.2-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 36731f5d89e799f5986b19b7b7c082478fba1acee56596d72ffced86b0d22c49
MD5 2f27ca6071758468e0573d68a63451d7
BLAKE2b-256 7e62decba17a93f97ba9f7abc4f560c9c63715f8b1d8b12a223a42b4c300d539

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page