Skip to main content

Kaldi alignment methods wrapped into Python

Project description

kaldialign

A small package that exposes edit distance computation functions from Kaldi. It uses the original Kaldi code and wraps it using pybind11.

Installation

conda install -c kaldialign kaldialign

or

pip install --verbose kaldialign

or

pip install --verbose -U git+https://github.com/pzelasko/kaldialign.git

or

git clone https://github.com/pzelasko/kaldialign.git
cd kaldialign
python3 -m pip install --verbose .

Examples

Alignment

align(ref, hyp, epsilon) - used to obtain the alignment between two string sequences. epsilon should be a null symbol (indicating deletion/insertion) that doesn't exist in either sequence.

from kaldialign import align

EPS = '*'
a = ['a', 'b', 'c']
b = ['a', 's', 'x', 'c']
ali = align(a, b, EPS)
assert ali == [('a', 'a'), ('b', 's'), (EPS, 'x'), ('c', 'c')]

Edit distance

edit_distance(ref, hyp) - used to obtain the total edit distance, as well as the number of insertions, deletions and substitutions.

from kaldialign import edit_distance

a = ['a', 'b', 'c']
b = ['a', 's', 'x', 'c']
results = edit_distance(a, b)
assert results == {
    'ins': 1,
    'del': 0,
    'sub': 1,
    'total': 2
}

For alignment and edit distance, you can pass sclite_mode=True to compute WER or alignments based on SCLITE style weights, i.e., insertion/deletion cost 3 and substitution cost 4.

Compound word matching

All functions accept merge_compounds=True to allow adjacent words in either sequence to be concatenated (without separator) to match a single word in the other sequence at zero cost. This is useful whenever there are inconsistencies within transcriptions, or between training and testing conditions of a model evaluated with WER.

from kaldialign import edit_distance, align

# "white paper" (2 words) matches "whitepaper" (1 word) with 0 errors
ref = ["the", "white", "paper", "is", "good"]
hyp = ["the", "whitepaper", "is", "good"]

results = edit_distance(ref, hyp, merge_compounds=True)
assert results["total"] == 0

# Works in both directions
results = edit_distance(hyp, ref, merge_compounds=True)
assert results["total"] == 0

# Alignment shows compound matches as space-joined strings
ali = align(ref, hyp, "*", merge_compounds=True)
assert ali == [
    ("the", "the"),
    ("white paper", "whitepaper"),
    ("is", "is"),
    ("good", "good"),
]

Bootstrapping method to extract WER 95% confidence intervals

boostrap_wer_ci(ref, hyp, hyp2=None) - obtain the 95% confidence intervals for WER using Bisani and Ney boostrapping method.

from kaldialign import bootstrap_wer_ci

ref = [
    ("a", "b", "c"),
    ("d", "e", "f"),
]
hyp = [
    ("a", "b", "d"),
    ("e", "f", "f"),
]
ans = bootstrap_wer_ci(ref, hyp)
assert ans["wer"] == 0.4989
assert ans["ci95"] == 0.2312
assert ans["ci95min"] == 0.2678
assert ans["ci95max"] == 0.7301

All bootstrap functions also accept merge_compounds=True.

It also supports providing hypotheses from system 1 and system 2 to compute the probability of S2 improving over S1:

from kaldialign import bootstrap_wer_ci

ref = [
    ("a", "b", "c"),
    ("d", "e", "f"),
]
hyp = [
    ("a", "b", "d"),
    ("e", "f", "f"),
]
hyp2 = [
    ("a", "b", "c"),
    ("e", "e", "f"),
]
ans = bootstrap_wer_ci(ref, hyp, hyp2)

s = ans["system1"]
assert s["wer"] == 0.4989
assert s["ci95"] == 0.2312
assert s["ci95min"] == 0.2678
assert s["ci95max"] == 0.7301

s = ans["system2"]
assert s["wer"] == 0.1656
assert s["ci95"] == 0.2312
assert s["ci95min"] == -0.0656
assert s["ci95max"] == 0.3968

assert ans["p_s2_improv_over_s1"] == 1.0

Motivation

The need for this arised from the fact that practically all implementations of the Levenshtein distance have slight differences, making it impossible to use a different scoring tool than Kaldi and get the same error rate results. This package copies code from Kaldi directly and wraps it using pybind11, avoiding the issue altogether.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kaldialign-0.10.0.tar.gz (30.4 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

kaldialign-0.10.0-cp314-cp314-win_amd64.whl (98.6 kB view details)

Uploaded CPython 3.14Windows x86-64

kaldialign-0.10.0-cp314-cp314-win32.whl (87.4 kB view details)

Uploaded CPython 3.14Windows x86

kaldialign-0.10.0-cp314-cp314-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (110.5 kB view details)

Uploaded CPython 3.14manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

kaldialign-0.10.0-cp314-cp314-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl (99.6 kB view details)

Uploaded CPython 3.14manylinux: glibc 2.24+ ARM64manylinux: glibc 2.28+ ARM64

kaldialign-0.10.0-cp313-cp313-win_amd64.whl (96.6 kB view details)

Uploaded CPython 3.13Windows x86-64

kaldialign-0.10.0-cp313-cp313-win32.whl (85.3 kB view details)

Uploaded CPython 3.13Windows x86

kaldialign-0.10.0-cp313-cp313-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (110.4 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

kaldialign-0.10.0-cp313-cp313-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl (99.4 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.24+ ARM64manylinux: glibc 2.28+ ARM64

kaldialign-0.10.0-cp312-cp312-win_amd64.whl (96.6 kB view details)

Uploaded CPython 3.12Windows x86-64

kaldialign-0.10.0-cp312-cp312-win32.whl (85.2 kB view details)

Uploaded CPython 3.12Windows x86

kaldialign-0.10.0-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (110.4 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

kaldialign-0.10.0-cp312-cp312-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl (99.4 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.24+ ARM64manylinux: glibc 2.28+ ARM64

kaldialign-0.10.0-cp311-cp311-win_amd64.whl (96.0 kB view details)

Uploaded CPython 3.11Windows x86-64

kaldialign-0.10.0-cp311-cp311-win32.whl (84.7 kB view details)

Uploaded CPython 3.11Windows x86

kaldialign-0.10.0-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (109.4 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

kaldialign-0.10.0-cp311-cp311-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl (99.1 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.24+ ARM64manylinux: glibc 2.28+ ARM64

kaldialign-0.10.0-cp310-cp310-win_amd64.whl (95.0 kB view details)

Uploaded CPython 3.10Windows x86-64

kaldialign-0.10.0-cp310-cp310-win32.whl (83.5 kB view details)

Uploaded CPython 3.10Windows x86

kaldialign-0.10.0-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (107.9 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

kaldialign-0.10.0-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl (97.6 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.24+ ARM64manylinux: glibc 2.28+ ARM64

File details

Details for the file kaldialign-0.10.0.tar.gz.

File metadata

  • Download URL: kaldialign-0.10.0.tar.gz
  • Upload date:
  • Size: 30.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for kaldialign-0.10.0.tar.gz
Algorithm Hash digest
SHA256 f0b1bcfd0f0bf121798d8c44d4ed9842e46bff6cf79fd7686b328bdfe6fa116d
MD5 101877063bed472a998c83b04a917665
BLAKE2b-256 41540da68e833527c7db0c2816e602a953181555c545f4f9de5026c06caad3b7

See more details on using hashes here.

File details

Details for the file kaldialign-0.10.0-cp314-cp314-win_amd64.whl.

File metadata

File hashes

Hashes for kaldialign-0.10.0-cp314-cp314-win_amd64.whl
Algorithm Hash digest
SHA256 74d35a5e99892de18bed24730ee2aa6509a022946053a4a4b763536e847799c0
MD5 cd3a4a1af1491ab8067d23f6aef5745b
BLAKE2b-256 b2cee0ff7d914afe2bd1543496e9a104d37d6240493e68b94fe730a24bee37db

See more details on using hashes here.

File details

Details for the file kaldialign-0.10.0-cp314-cp314-win32.whl.

File metadata

  • Download URL: kaldialign-0.10.0-cp314-cp314-win32.whl
  • Upload date:
  • Size: 87.4 kB
  • Tags: CPython 3.14, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.10

File hashes

Hashes for kaldialign-0.10.0-cp314-cp314-win32.whl
Algorithm Hash digest
SHA256 4a9f180b1cfaaa592f665528e8c96368796b9aeccac666873f614d5c64256ee2
MD5 f32c1a4ffdd731d7287ae701ba1ab7ef
BLAKE2b-256 9655a2111b2a45620d7447520aaa952ef9ac9219a25bfde54559d70f576a05c7

See more details on using hashes here.

File details

Details for the file kaldialign-0.10.0-cp314-cp314-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for kaldialign-0.10.0-cp314-cp314-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 ecb105a93cdacd9930b9932347d470ae1752e4d601755336925a4a4ad1d16470
MD5 f82eeac49e2fc73ab4fd6b90b970ebfb
BLAKE2b-256 237d68f3a68af4d55874af2f5907d52f998e0769845e6efc3ae5ed85a432e20d

See more details on using hashes here.

File details

Details for the file kaldialign-0.10.0-cp314-cp314-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for kaldialign-0.10.0-cp314-cp314-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 daa02a6a678bcc23771a5a67b1a5422accd852e8571120e0fa8c91c4a6de30ba
MD5 d96464a983b0bd50bacad3b10e6ee279
BLAKE2b-256 915cc35829f6e7cebba33383a52d85bb93145d6cd9de94831a40b1fa4a4ff4d2

See more details on using hashes here.

File details

Details for the file kaldialign-0.10.0-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for kaldialign-0.10.0-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 587a13f71256270c05435e6487e3283270096b1593230578c4e450b684315ee5
MD5 5cb00961bc89eba9e3b4c680e38b1d64
BLAKE2b-256 07404c411b8afaa1193138f6499de1bea88e198269c2e0ca8b246e9cafdcc9f9

See more details on using hashes here.

File details

Details for the file kaldialign-0.10.0-cp313-cp313-win32.whl.

File metadata

  • Download URL: kaldialign-0.10.0-cp313-cp313-win32.whl
  • Upload date:
  • Size: 85.3 kB
  • Tags: CPython 3.13, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.10

File hashes

Hashes for kaldialign-0.10.0-cp313-cp313-win32.whl
Algorithm Hash digest
SHA256 0635d3db9954039558d0deaaa9b7d4053a1180054ea5fb706c74d05403414159
MD5 f2a49fc7165d0b4f53c7f02a4a98a4cd
BLAKE2b-256 62a2ea907281eeb5e9f8f43c2230435a938be5354b72e57f1b02f9ea87e912fc

See more details on using hashes here.

File details

Details for the file kaldialign-0.10.0-cp313-cp313-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for kaldialign-0.10.0-cp313-cp313-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 bd1d314783ddf641702839684db9ff1dc9b29f23f44b35105cc581eb816e8173
MD5 32f4afae256596a6cc988965a08adbc0
BLAKE2b-256 08706d1a3f4485622b7d1dec592b5c8f815c87ba08b5a416ff287b0a941f6b74

See more details on using hashes here.

File details

Details for the file kaldialign-0.10.0-cp313-cp313-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for kaldialign-0.10.0-cp313-cp313-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 99ae4eb5b9279fd8cb69c552c4ced8962a6aa65ed9b18982536e2b530c7b16ed
MD5 35601aa849091b86427d5ce545932463
BLAKE2b-256 885f2a5b55098c7b62b72b4ead6866077b01b21b273dde7188e352f23e3c5e16

See more details on using hashes here.

File details

Details for the file kaldialign-0.10.0-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for kaldialign-0.10.0-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 83eaccf65e728bb2caf5ae015670b3e7984e08b0f3e264704f7531b47b7087a7
MD5 6247fe27ecb1fc7728b919833afb5e9e
BLAKE2b-256 564241a662e51d9bbf2f96569a026b35d0934fcb3cb64209ea75f4da2d0fae27

See more details on using hashes here.

File details

Details for the file kaldialign-0.10.0-cp312-cp312-win32.whl.

File metadata

  • Download URL: kaldialign-0.10.0-cp312-cp312-win32.whl
  • Upload date:
  • Size: 85.2 kB
  • Tags: CPython 3.12, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.10

File hashes

Hashes for kaldialign-0.10.0-cp312-cp312-win32.whl
Algorithm Hash digest
SHA256 2ddf5cccc115c13d15ac1ef50b78a6b5b21fd96ae07eb8db2d2448c40181385a
MD5 1ca25ec44194cb792962aec21b1e5816
BLAKE2b-256 e08e3d1dbb8e355c973cdb02fb19f24b998a59391ba6efe86e511f5d7fa653b4

See more details on using hashes here.

File details

Details for the file kaldialign-0.10.0-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for kaldialign-0.10.0-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 f2e82df9da3a4b5820abcaab10c57326dbc71c904e71973d749bc33281f0df06
MD5 33d07d576f290ced27c41acc5c9a975c
BLAKE2b-256 a64d48b6e98a9f5deb24d3454a0c16d5d44cea378137f4f4cd2ed570fb069c7d

See more details on using hashes here.

File details

Details for the file kaldialign-0.10.0-cp312-cp312-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for kaldialign-0.10.0-cp312-cp312-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 b0fcaf8b63514b9dd0575aeeab27cdf81fb0d805d43cb77e9df2a34e24cd06c6
MD5 834977da7d3eebaddc427f4d38bedcd9
BLAKE2b-256 ebcee3f88b9e51daba52cb19f3ebc3692558c1e33fffa03e9d1ae7b3e2db1630

See more details on using hashes here.

File details

Details for the file kaldialign-0.10.0-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for kaldialign-0.10.0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 8b4cdf0ea28c70791adaa32b6d5da5335f0564a6fd2c897389526e3ee4408411
MD5 2bf15fe2d21938ed5d90e1a995b06b90
BLAKE2b-256 d6836c998bbcc0722472d493539f0201642e3899198f140822aef265eb9f27b1

See more details on using hashes here.

File details

Details for the file kaldialign-0.10.0-cp311-cp311-win32.whl.

File metadata

  • Download URL: kaldialign-0.10.0-cp311-cp311-win32.whl
  • Upload date:
  • Size: 84.7 kB
  • Tags: CPython 3.11, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.10

File hashes

Hashes for kaldialign-0.10.0-cp311-cp311-win32.whl
Algorithm Hash digest
SHA256 662b880f683909d782e1b4e8203f202a6d746ae3f7b7af7f072cae42cd7c4291
MD5 bda7275e0f7a7cdbd7623f3e2018c5b7
BLAKE2b-256 6193fd7eb4868174aa4a95b51bfbf9fef786573db7ddf20e476c2cb43664b21a

See more details on using hashes here.

File details

Details for the file kaldialign-0.10.0-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for kaldialign-0.10.0-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 c15d291a9ef3816d8a4e001a3363d633bb5184350a2b698f2b68f7e863ed2f1c
MD5 133bce7ccdbeadbe4b09508f69f51996
BLAKE2b-256 9b4df95932b6c76be67ff2e4aac7429993edcfe4a5f4512e5b95caedb45e5778

See more details on using hashes here.

File details

Details for the file kaldialign-0.10.0-cp311-cp311-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for kaldialign-0.10.0-cp311-cp311-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 9bfb49ac9db43ff96db3602e283e6c1f25844767cf23ea304234f53e91b303b1
MD5 b4226dce0eaabe8334d36fd01edd9214
BLAKE2b-256 ce97cafe4e79e804affd1e80e14b8d48f781a1a8b966beda3e20365ee2939d9e

See more details on using hashes here.

File details

Details for the file kaldialign-0.10.0-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for kaldialign-0.10.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 94157238f605dd01703004ce4e7b5a36cd60ceb3a0b72587b01938fd53cbd331
MD5 88dba5b7f6ec1352bce001112ddee4be
BLAKE2b-256 17c4bbcf879c493581eddd29637abe63974ca8d1789d36c6d1e3f92cf4218e7a

See more details on using hashes here.

File details

Details for the file kaldialign-0.10.0-cp310-cp310-win32.whl.

File metadata

  • Download URL: kaldialign-0.10.0-cp310-cp310-win32.whl
  • Upload date:
  • Size: 83.5 kB
  • Tags: CPython 3.10, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.10

File hashes

Hashes for kaldialign-0.10.0-cp310-cp310-win32.whl
Algorithm Hash digest
SHA256 5b027d1961836c8513e17aa261720f5f7c053fb5685f5210e47724750aa1a3d0
MD5 ab55683cc06208a097a69f22c583d712
BLAKE2b-256 095d55f414afe965b6cc9aa3443a9cba326b29d0be83e587a68025e652a4673c

See more details on using hashes here.

File details

Details for the file kaldialign-0.10.0-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for kaldialign-0.10.0-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 e0b72b9eea05247c59912288dbfa1cc6e8ada89cf0d069b6c13c8021d03db852
MD5 f50e4ec92f8388566e85894a5486ffd9
BLAKE2b-256 6a09db7a4613ba40931d9872df3bc3a2d9512073545a96176d29e0074578aed5

See more details on using hashes here.

File details

Details for the file kaldialign-0.10.0-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for kaldialign-0.10.0-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 2a453b779c04b4cc6005fd0616da06d9edef86bd213795773177ae232ff8a39a
MD5 88eb289c4f62a731b09e670db1ae17f5
BLAKE2b-256 8c9de6126abdccbe191828682c89701192aa90f5dcb2d69d799457c36271a0ef

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page