Skip to main content

Fast banded edit distance

Project description

Travis

tinyalign

A small Python module providing edit distance (aka Levenshtein distance, that is, counting insertions, deletions and substitutions) and Hamming distance computation.

Its main purpose is to speed up computation of edit distance by allowing to specify a maximum number of differences maxdiff (banding). If that parameter is provided, the returned edit distance is anly accurate up to maxdiff. That is, if the actual edit distance is higher than maxdiff, a value larger than maxdiff is returned, but not necessarily the actual edit distance.

For computing regular edit distances or if your maxdiff is less than 4, you should prefer https://github.com/fujimotos/polyleven, as that is faster in that case. When maxdiff is 4 or more, but not too close to the length of the shortest string, this module is faster.

>>> from tinyalign import edit_distance, hamming_distance
>>> edit_distance("banana", "ananas")
2
>>> hamming_distance("hello", "yello")
1
>>> edit_distance("hello", "world", maxdiff=2)
3

Project details


Release history Release notifications

This version

0.2

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for tinyalign, version 0.2
Filename, size File type Python version Upload date Hashes
Filename, size tinyalign-0.2-cp36-cp36m-manylinux1_x86_64.whl (18.5 kB) File type Wheel Python version cp36 Upload date Hashes View hashes
Filename, size tinyalign-0.2-cp37-cp37m-manylinux1_x86_64.whl (18.6 kB) File type Wheel Python version cp37 Upload date Hashes View hashes
Filename, size tinyalign-0.2.tar.gz (34.8 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page