Skip to main content

Kaldi alignment methods wrapped into Python

Project description

kaldialign

A small package that exposes edit distance computation functions from Kaldi. It uses the original Kaldi code and wraps it using pybind11.

Installation

conda install -c kaldialign kaldialign

or

pip install --verbose kaldialign

or

pip install --verbose -U git+https://github.com/pzelasko/kaldialign.git

or

git clone https://github.com/pzelasko/kaldialign.git
cd kaldialign
python3 -m pip install --verbose .

Examples

Alignment

align(seq1, seq2, epsilon) - used to obtain the alignment between two string sequences. epsilon should be a null symbol (indicating deletion/insertion) that doesn't exist in either sequence.

from kaldialign import align

EPS = '*'
a = ['a', 'b', 'c']
b = ['a', 's', 'x', 'c']
ali = align(a, b, EPS)
assert ali == [('a', 'a'), ('b', 's'), (EPS, 'x'), ('c', 'c')]

Edit distance

edit_distance(seq1, seq2) - used to obtain the total edit distance, as well as the number of insertions, deletions and substitutions.

from kaldialign import edit_distance

a = ['a', 'b', 'c']
b = ['a', 's', 'x', 'c']
results = edit_distance(a, b)
assert results == {
    'ins': 1,
    'del': 0,
    'sub': 1,
    'total': 2
}

For alignment and edit distance, you can pass sclite_mode=True to compute WER or alignments based on SCLITE style weights, i.e., insertion/deletion cost 3 and substitution cost 4.

Bootstrapping method to extract WER 95% confidence intervals

boostrap_wer_ci(ref, hyp) - obtain the 95% confidence intervals for WER using Bisani and Ney boostrapping method.

from kaldialign import bootstrap_wer_ci

ref = [
    ("a", "b", "c"),
    ("d", "e", "f"),
]
hyp = [
    ("a", "b", "d"),
    ("e", "f", "f"),
]
ans = bootstrap_wer_ci(ref, hyp)
assert ans["wer"] == 0.4989
assert ans["ci95"] == 0.2312
assert ans["ci95min"] == 0.2678
assert ans["ci95max"] == 0.7301

It also supports providing hypotheses from system 1 and system 2 to compute the probability of S2 improving over S1:

from kaldialign import bootstrap_wer_ci

ref = [
    ("a", "b", "c"),
    ("d", "e", "f"),
]
hyp = [
    ("a", "b", "d"),
    ("e", "f", "f"),
]
hyp2 = [
    ("a", "b", "c"),
    ("e", "e", "f"),
]
ans = bootstrap_wer_ci(ref, hyp, hyp2)

s = ans["system1"]
assert s["wer"] == 0.4989
assert s["ci95"] == 0.2312
assert s["ci95min"] == 0.2678
assert s["ci95max"] == 0.7301

s = ans["system2"]
assert s["wer"] == 0.1656
assert s["ci95"] == 0.2312
assert s["ci95min"] == -0.0656
assert s["ci95max"] == 0.3968

assert ans["p_s2_improv_over_s1"] == 1.0

Motivation

The need for this arised from the fact that practically all implementations of the Levenshtein distance have slight differences, making it impossible to use a different scoring tool than Kaldi and get the same error rate results. This package copies code from Kaldi directly and wraps it using Cython, avoiding the issue altogether.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kaldialign-0.8.0.tar.gz (25.8 kB view details)

Uploaded Source

Built Distributions

kaldialign-0.8.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (84.6 kB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

kaldialign-0.8.0-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl (90.6 kB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ i686

kaldialign-0.8.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (80.3 kB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ ARM64

kaldialign-0.8.0-cp311-cp311-macosx_10_9_universal2.whl (102.0 kB view details)

Uploaded CPython 3.11 macOS 10.9+ universal2 (ARM64, x86-64)

kaldialign-0.8.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (84.6 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

kaldialign-0.8.0-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl (90.6 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ i686

kaldialign-0.8.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (80.3 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ ARM64

kaldialign-0.8.0-cp310-cp310-macosx_10_9_universal2.whl (102.0 kB view details)

Uploaded CPython 3.10 macOS 10.9+ universal2 (ARM64, x86-64)

kaldialign-0.8.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (84.7 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

kaldialign-0.8.0-cp39-cp39-manylinux_2_17_i686.manylinux2014_i686.whl (90.5 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ i686

kaldialign-0.8.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (80.4 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ ARM64

kaldialign-0.8.0-cp39-cp39-macosx_10_9_universal2.whl (102.2 kB view details)

Uploaded CPython 3.9 macOS 10.9+ universal2 (ARM64, x86-64)

kaldialign-0.8.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (84.5 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

kaldialign-0.8.0-cp38-cp38-manylinux_2_17_i686.manylinux2014_i686.whl (90.5 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ i686

kaldialign-0.8.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (80.3 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ ARM64

kaldialign-0.8.0-cp38-cp38-macosx_10_9_universal2.whl (101.9 kB view details)

Uploaded CPython 3.8 macOS 10.9+ universal2 (ARM64, x86-64)

kaldialign-0.8.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (84.9 kB view details)

Uploaded CPython 3.7m manylinux: glibc 2.17+ x86-64

kaldialign-0.8.0-cp37-cp37m-manylinux_2_17_i686.manylinux2014_i686.whl (91.2 kB view details)

Uploaded CPython 3.7m manylinux: glibc 2.17+ i686

kaldialign-0.8.0-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (81.6 kB view details)

Uploaded CPython 3.7m manylinux: glibc 2.17+ ARM64

kaldialign-0.8.0-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (84.8 kB view details)

Uploaded CPython 3.6m manylinux: glibc 2.17+ x86-64

kaldialign-0.8.0-cp36-cp36m-manylinux_2_17_i686.manylinux2014_i686.whl (91.1 kB view details)

Uploaded CPython 3.6m manylinux: glibc 2.17+ i686

File details

Details for the file kaldialign-0.8.0.tar.gz.

File metadata

  • Download URL: kaldialign-0.8.0.tar.gz
  • Upload date:
  • Size: 25.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.10.12

File hashes

Hashes for kaldialign-0.8.0.tar.gz
Algorithm Hash digest
SHA256 dd836b0f0a63cf38109df9cb08f0f0daba45d9f2685fb22e6565d63556c55ddb
MD5 34bd9c4b9e717ee555cbfc6039ed1a72
BLAKE2b-256 8a8990bca03aa33a219ebf306dd6ff70579aa965e1a3a200f8cf455bf7f6dee9

See more details on using hashes here.

File details

Details for the file kaldialign-0.8.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for kaldialign-0.8.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 669bcf3c876aaf32423678266fc80981c24fd38b4090b3a527feb46b820db21b
MD5 a4381179d9cfb9e3d76d4daa6aa7bdd6
BLAKE2b-256 b35282818819a64960b520c3c4436c505629df5dfde2970d5904bf4034b8f14a

See more details on using hashes here.

File details

Details for the file kaldialign-0.8.0-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for kaldialign-0.8.0-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 761bde90e4f3d765d2eb922f90d9a86121592e2463b3bdd5b054bdca4e94c30e
MD5 4bee8d74a6c652f21e9ccbb9c0918dac
BLAKE2b-256 39891bde13e0072ddb251fb561399b0a88134b78701bbe8cfb56a7c8355c1849

See more details on using hashes here.

File details

Details for the file kaldialign-0.8.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for kaldialign-0.8.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 37d411035f49a57163f88219163b98dbc55baa55cd966f26af7de92537bd70be
MD5 d43be1adb9ea8e754ac82fcbfc56713b
BLAKE2b-256 e9745c2642a4fb70a5001be5a0d419251373d583a49c34eabe7a910b89db33ed

See more details on using hashes here.

File details

Details for the file kaldialign-0.8.0-cp311-cp311-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for kaldialign-0.8.0-cp311-cp311-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 4229b02e79c0850ff1099517f0f2c1cae9451fda4b9ea738787a8026d6b92908
MD5 012a28b27f268100546ec560bae7fa9e
BLAKE2b-256 fd674498c64456dfec1e8cec725e549abf68aad95da27895cbe407655fce2fd0

See more details on using hashes here.

File details

Details for the file kaldialign-0.8.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for kaldialign-0.8.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 48be0b099035fc30e34c24ba6c5328e71ee9e0c482b6c139dc843467c6249711
MD5 73a144cca186a4e2b543344ae235ea3b
BLAKE2b-256 f236f9809b0f28c5ffb7d947c19725cfd2063b62fa040b46af6b799b7aba0813

See more details on using hashes here.

File details

Details for the file kaldialign-0.8.0-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for kaldialign-0.8.0-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 63e14f43e69761f3a804ce030d12b94025b3f5e859b94719dad0d6da230a132d
MD5 38d57b4005e58bcf8b42a664fdb98963
BLAKE2b-256 e8db31fa260c4b824b3ba9f2d6c9a1fedb6cfa1fea8cdb32cc99517d23bbf0bf

See more details on using hashes here.

File details

Details for the file kaldialign-0.8.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for kaldialign-0.8.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 b2252f66aed0081ad35890dd0fb63ee755289b9612e228f18ebe9dccc559e281
MD5 73f72f03ff584d8ef0350ec3a6ff30ce
BLAKE2b-256 f4d45f2145823d13c860ceec9f6d2a138e4b15cbbe5243ad7df6940a51ae5d62

See more details on using hashes here.

File details

Details for the file kaldialign-0.8.0-cp310-cp310-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for kaldialign-0.8.0-cp310-cp310-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 fb4f126711db3d2a7b2e3afe441f3e653ab4b5bde9bd525513976eb9b9d817cf
MD5 64e47ea6620d32760ee762013978e531
BLAKE2b-256 ae27928bb13813edc88230c0d2cfc938a3d5a40a1f4897dc60048a827ce2fc55

See more details on using hashes here.

File details

Details for the file kaldialign-0.8.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for kaldialign-0.8.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 c8eaecf2215d78a22dd6b54052ba3d5ef8c953eeb23b2db25f44f10208c37af7
MD5 143dd6f3c1c435cc2735b5ef54f83b75
BLAKE2b-256 a3b5a6d187719adf97f1d81377cf30e91f3fecbb4f9a3498649ad13b2114cc11

See more details on using hashes here.

File details

Details for the file kaldialign-0.8.0-cp39-cp39-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for kaldialign-0.8.0-cp39-cp39-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 c8cd3ce6fd2a1054a19de9e7978ad1b025125e9504714d91caca129926230d3f
MD5 7ef6d75a8a433bb15785aeced361d450
BLAKE2b-256 6593c8c98613ba127ac27f8357f9d31a87a7801d018cbf4c57db10e67ef27c63

See more details on using hashes here.

File details

Details for the file kaldialign-0.8.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for kaldialign-0.8.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 3884955d22aa0e273cea94bfa9d42094ef8b2d64dae0e62ab77ef79fce23dbad
MD5 2e3df49853916dd1f1d4a35ba10c3dac
BLAKE2b-256 f25b7552fd615808dce39428a1fbb2c5880b981bc56f25a082c2ada4ff352a2e

See more details on using hashes here.

File details

Details for the file kaldialign-0.8.0-cp39-cp39-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for kaldialign-0.8.0-cp39-cp39-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 da25b6ac6ebeb4e84ebda2a94907d5dcae9e727d5a11df4255239f9bcd91c068
MD5 28049541fb7b1b10b83c2ec3b501be67
BLAKE2b-256 4582b908c179ad17766e5f17d927dd687c38c89e6d3ac088dc9d48fff563dc36

See more details on using hashes here.

File details

Details for the file kaldialign-0.8.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for kaldialign-0.8.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 f74f45cc98e3f09ffe304b3f5c22e642759ea517c1aa0861a0e29c0d7a3a59fc
MD5 b7e2a42605e562008dee914c531227cd
BLAKE2b-256 95fc7b9ce4cc51272326e3f3c3b86fda51cd7ca8ad9fa7caf86e7b3294087d41

See more details on using hashes here.

File details

Details for the file kaldialign-0.8.0-cp38-cp38-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for kaldialign-0.8.0-cp38-cp38-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 d35a033404b28d3e80320ea8bb6c68e8e4974b34d6f253bf69d1234103020495
MD5 3be0ba5e3d9198f40f15729c35cdf43f
BLAKE2b-256 de26dd3f208179c912924bd5e2c2893b019157965f5a674ff3e171523e6ddc81

See more details on using hashes here.

File details

Details for the file kaldialign-0.8.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for kaldialign-0.8.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 e5a0535f3cecd6d126e3e118d061bda3da71cfeb70e565d0fc55567a754f872e
MD5 64e00242b1d76913fdd34d3c730f7d34
BLAKE2b-256 677bdbc456e93cbd2a49e4ade9cd03f390100d86393f9bed04ef6a9c07699052

See more details on using hashes here.

File details

Details for the file kaldialign-0.8.0-cp38-cp38-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for kaldialign-0.8.0-cp38-cp38-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 47d7b69c6fd06585e28119c815c7828a4bac1dba4a7d680ee29cfd009082ae2e
MD5 5ffb18b1f0457cd9a8d872b8e88d5d2f
BLAKE2b-256 d59aa1f0d9302191ce8b1afb3214cede495ed9404d607b5dad504efe95e62941

See more details on using hashes here.

File details

Details for the file kaldialign-0.8.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for kaldialign-0.8.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 dd13138553d071c838f4fe0cb350113b094ba6e9269b787d94c76cd99b3fa8fd
MD5 b7963470f69818757279973ab4087787
BLAKE2b-256 90cea7f0405955352961bed90e42e7c21090758edf2e9e8ff381d1df88a971b0

See more details on using hashes here.

File details

Details for the file kaldialign-0.8.0-cp37-cp37m-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for kaldialign-0.8.0-cp37-cp37m-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 f693dafcb9a5efacb4d0fc7cced761b48a5a321aef4cbdcda2ee144121381aa1
MD5 53718b4ba177bb6cba1513736d87b8a2
BLAKE2b-256 dce74b6c415472983ceb514916c46bf322ecef2156b241ff3d6a3373b0d59af4

See more details on using hashes here.

File details

Details for the file kaldialign-0.8.0-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for kaldialign-0.8.0-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 5d0a10d97c7c118430ebabf6cd37898305d3ccc15dbd48992a56be085fafc5eb
MD5 23e1b2124acff7d451b53de84d982937
BLAKE2b-256 9f17ead4dabb8ea65454d92e1a7bb000d3331e6e07f84bda6cb3afc6f4da85ac

See more details on using hashes here.

File details

Details for the file kaldialign-0.8.0-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for kaldialign-0.8.0-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 d28e6ca9ef6aed6a8698bdd2da5fa5641585f5065b856e2b98895602b4078d92
MD5 fd5bb4fb3f733ceda0a89f47f2f4e15a
BLAKE2b-256 045294666563307ed2383776e9b58d36e97e846023b910f7de3e45c85278719e

See more details on using hashes here.

File details

Details for the file kaldialign-0.8.0-cp36-cp36m-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for kaldialign-0.8.0-cp36-cp36m-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 35fa7dcd5e2d8b759890787ed8084fb6400f941d2de12ed6354e2226ffb47fd2
MD5 e41aeccc1473dd726a2757faac11edf8
BLAKE2b-256 5fea1f837d0b311f5f1e33844fe04a8f5ac91ed0bbdfd6a19d79b1b4c1340b97

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page