Skip to main content

Fast banded edit distance

Project description

CI

tinyalign

A small Python module providing edit distance (aka Levenshtein distance, that is, counting insertions, deletions and substitutions) and Hamming distance computation.

Its main purpose is to speed up computation of edit distance by allowing to specify a maximum number of differences maxdiff (banding). If that parameter is provided, the returned edit distance is anly accurate up to maxdiff. That is, if the actual edit distance is higher than maxdiff, a value larger than maxdiff is returned, but not necessarily the actual edit distance.

For computing regular edit distances or if your maxdiff is less than 4, you should prefer https://github.com/fujimotos/polyleven, as that is faster in that case. When maxdiff is 4 or more, but not too close to the length of the shortest string, this module is faster.

>>> from tinyalign import edit_distance, hamming_distance
>>> edit_distance("banana", "ananas")
2
>>> hamming_distance("hello", "yello")
1
>>> edit_distance("hello", "world", maxdiff=2)
3

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tinyalign-0.3.dev2.tar.gz (35.1 kB view hashes)

Uploaded Source

Built Distributions

tinyalign-0.3.dev2-cp38-cp38-manylinux1_x86_64.whl (19.3 kB view hashes)

Uploaded CPython 3.8

tinyalign-0.3.dev2-cp37-cp37m-manylinux1_x86_64.whl (18.7 kB view hashes)

Uploaded CPython 3.7m

tinyalign-0.3.dev2-cp36-cp36m-manylinux1_x86_64.whl (18.6 kB view hashes)

Uploaded CPython 3.6m

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page