Fast Levenshtein and Damerau optimal string alignment algorithms.
Project description
editdistpy
editdistpy is a fast implementation of the Levenshtein edit distance and the Damerau-Levenshtein optimal string alignment (OSA) edit distance algorithms. The original C# project can be found at SoftWx.Match.
Installation
The easiest way to install editdistpy is using pip
:
pip install -U editdistpy
Usage
You can specify the max_distance
you care about, if the edit distance exceeds
this max_distance
, -1
will be returned. Specifying a sensible max distance
can result in significant speed improvement.
You can also specify max_distance=sys.maxsize
if you wish for the actual edit
distance to always be computed.
Levenshtein
import sys
from editdistpy import levenshtein
string_1 = "flintstone"
string_2 = "hanson"
max_distance = 2
print(levenshtein.distance(string_1, string_2, max_distance))
# expected output: -1
max_distance = sys.maxsize
print(levenshtein.distance(string_1, string_2, max_distance))
# expected output: 6
Damerau-Levenshtein OSA
import sys
from editdistpy import damerau_osa
string_1 = "flintstone"
string_2 = "hanson"
max_distance = 2
print(damerau_osa.distance(string_1, string_2, max_distance))
# expected output: -1
max_distance = sys.maxsize
print(damerau_osa.distance(string_1, string_2, max_distance))
# expected output: 6
Benchmark
A simple benchmark was done on Python 3.8.12 against editdistance which implements the Levenshtein edit distance algorithm.
The script used by the benchmark can be found here.
For clarity, the following string pairs were used.
Short string
"short sentence with words"
"shrtsen tence wit mispeledwords"
Long string
"Lorem ipsum dolor sit amet consectetur adipiscing elit sed do eiusmod rem"
"Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium"
short string
test_damerau_osa 0.925678600000083
test_levenshtein 0.6640075999998771
test_editdistance 0.9197039000000586
test_damerau_osa_early_cutoff 0.7028707999998005
test_levenshtein_early_cutoff 0.5697816000001694
long string
test_damerau_osa 7.7526998000003005
test_levenshtein 4.262871200000063
test_editdistance 1.9676684999999452
test_damerau_osa_early_cutoff 0.9891195999998672
test_levenshtein_early_cutoff 0.9085431999997127
While max_distance=10
significantly improves the computation time, it may not
be a sensible value in some cases.
editdistpy is also seen to perform better with shorter length strings and can be the more suitable library if your use case mainly deals with comparing short strings.
Changelog
See the changelog for a history of notable changes to edistdistpy.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
File details
Details for the file editdistpy-0.1.3rc2.tar.gz
.
File metadata
- Download URL: editdistpy-0.1.3rc2.tar.gz
- Upload date:
- Size: 55.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9584883b3c043dd4112c2338d91aeaf3393e65ce6e4c59e7794cf7668c6b3897 |
|
MD5 | 3a797873302c6dcf70440b55bdf7869a |
|
BLAKE2b-256 | dc29ee77ba2541dadb5d4c3b49852eed6edf0f29b4d04a223c7fd7b8073812a2 |
File details
Details for the file editdistpy-0.1.3rc2-cp38-cp38-win_amd64.whl
.
File metadata
- Download URL: editdistpy-0.1.3rc2-cp38-cp38-win_amd64.whl
- Upload date:
- Size: 34.7 kB
- Tags: CPython 3.8, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6e4e8fb7679ccf7fdfc5596378f52874aea2e3b59e23bfc7ff558f4245ccf0ec |
|
MD5 | 5f4d49230904d1cab277a9ee5d319d55 |
|
BLAKE2b-256 | e53ed9cab46d1ea0396f92e73b607640754d02873121cbcd588c1f79fd0afdb5 |
File details
Details for the file editdistpy-0.1.3rc2-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: editdistpy-0.1.3rc2-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 123.3 kB
- Tags: CPython 3.8, manylinux: glibc 2.17+ x86-64, manylinux: glibc 2.5+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f4cb9cbce135ed5c62f91cf3d5386b053d5227a81c2c1013e8a592350fdd0aff |
|
MD5 | 06cbc16626ffa6f6f10819848ea2903c |
|
BLAKE2b-256 | 29f0bb818af5f8cb97e94411892e244a7203bf764349b8e663c430d802338b08 |
File details
Details for the file editdistpy-0.1.3rc2-cp38-cp38-macosx_10_9_x86_64.whl
.
File metadata
- Download URL: editdistpy-0.1.3rc2-cp38-cp38-macosx_10_9_x86_64.whl
- Upload date:
- Size: 27.4 kB
- Tags: CPython 3.8, macOS 10.9+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 84f54f63bcde9249e30d80bd14cd6d7b08dc543810eeefab089115bd2c2aeb64 |
|
MD5 | 66d34e39f606e0e4004a8be98f256f12 |
|
BLAKE2b-256 | 09ca5415beef8053bd9b8d416f11b21c2cee5e1ba76fc05761375229a4d0373f |
File details
Details for the file editdistpy-0.1.3rc2-cp37-cp37m-win_amd64.whl
.
File metadata
- Download URL: editdistpy-0.1.3rc2-cp37-cp37m-win_amd64.whl
- Upload date:
- Size: 34.4 kB
- Tags: CPython 3.7m, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fe25d61f756dcc44519ad9c3ffccb5458a1d40996663d5146c3dc7c4de2b4852 |
|
MD5 | e4af086700f1e05cd9efbb02a533a305 |
|
BLAKE2b-256 | 649adb67bdd44b4a3759bda5e9c426c1cdf27e1706e4b3e3b34a2b9d9e21c8a3 |
File details
Details for the file editdistpy-0.1.3rc2-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: editdistpy-0.1.3rc2-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 121.8 kB
- Tags: CPython 3.7m, manylinux: glibc 2.17+ x86-64, manylinux: glibc 2.5+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c3f485660fb9b7af54799ad20ee78dcc66e33c908de5ac283d87a2860a2aab81 |
|
MD5 | 444dece34ee54f14f1ac08a154554341 |
|
BLAKE2b-256 | 2276dc3367087f913a08ab79c222627b4c837ac8ef6ca526081db091dfe8ece7 |
File details
Details for the file editdistpy-0.1.3rc2-cp36-cp36m-win_amd64.whl
.
File metadata
- Download URL: editdistpy-0.1.3rc2-cp36-cp36m-win_amd64.whl
- Upload date:
- Size: 34.3 kB
- Tags: CPython 3.6m, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5b61adb8bad6de20c532fa221116541dd700d381f6ef14f9f2a45a83ab204a27 |
|
MD5 | d75eca9d7e8c22908a26b8ee448adb3e |
|
BLAKE2b-256 | 2547fb06c2af997bc011286f9c48b30e9f19f7e455de603d05cd0c7f1b5c51a9 |
File details
Details for the file editdistpy-0.1.3rc2-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: editdistpy-0.1.3rc2-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 119.6 kB
- Tags: CPython 3.6m, manylinux: glibc 2.17+ x86-64, manylinux: glibc 2.5+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1aeffcbfbf5edce8f73d5b56471acf927876e789b388dcaf933d38079b361c65 |
|
MD5 | e15ca8fc4b9f237cb0c3a6ef1b7633cd |
|
BLAKE2b-256 | 7977745893e16823cd039c2a6cbcf3aee20762a974315cd04de5ebe634ed2e7c |