Skip to main content

Fast Levenshtein and Damerau optimal string alignment algorithms.

Project description

editdistpy
PyPI version Tests

editdistpy is a fast implementation of the Levenshtein edit distance and the Damerau-Levenshtein optimal string alignment (OSA) edit distance algorithms. The original C# project can be found at SoftWx.Match.

Installation


The easiest way to install editdistpy is using pip:

pip install -U editdistpy

Usage


You can specify the max_distance you care about, if the edit distance exceeds this max_distance, -1 will be returned. Specifying a sensible max distance can result in significant speed improvement.

You can also specify max_distance=sys.maxsize if you wish for the actual edit distance to always be computed.

Levenshtein

import sys

from editdistpy import levenshtein

string_1 = "flintstone"
string_2 = "hanson"

max_distance = 2
print(levenshtein.distance(string_1, string_2, max_distance))
# expected output: -1

max_distance = sys.maxsize
print(levenshtein.distance(string_1, string_2, max_distance))
# expected output: 6

Damerau-Levenshtein OSA

import sys

from editdistpy import damerau_osa

string_1 = "flintstone"
string_2 = "hanson"

max_distance = 2
print(damerau_osa.distance(string_1, string_2, max_distance))
# expected output: -1

max_distance = sys.maxsize
print(damerau_osa.distance(string_1, string_2, max_distance))
# expected output: 6

Benchmark


A simple benchmark was done on Python 3.8.12 against editdistance which implements the Levenshtein edit distance algorithm.

The script used by the benchmark can be found here.

For clarity, the following string pairs were used.

Short string

"short sentence with words"

"shrtsen tence wit mispeledwords"

Long string

"Lorem ipsum dolor sit amet consectetur adipiscing elit sed do eiusmod rem"

"Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium"

short string
        test_damerau_osa               0.925678600000083
        test_levenshtein               0.6640075999998771
        test_editdistance              0.9197039000000586
        test_damerau_osa_early_cutoff  0.7028707999998005
        test_levenshtein_early_cutoff  0.5697816000001694
long string
        test_damerau_osa               7.7526998000003005
        test_levenshtein               4.262871200000063
        test_editdistance              1.9676684999999452
        test_damerau_osa_early_cutoff  0.9891195999998672
        test_levenshtein_early_cutoff  0.9085431999997127

While max_distance=10 significantly improves the computation time, it may not be a sensible value in some cases.

editdistpy is also seen to perform better with shorter length strings and can be the more suitable library if your use case mainly deals with comparing short strings.

Changelog


See the changelog for a history of notable changes to edistdistpy.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

editdistpy-0.1.3rc2.tar.gz (55.5 kB view details)

Uploaded Source

Built Distributions

editdistpy-0.1.3rc2-cp38-cp38-win_amd64.whl (34.7 kB view details)

Uploaded CPython 3.8 Windows x86-64

editdistpy-0.1.3rc2-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (123.3 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

editdistpy-0.1.3rc2-cp38-cp38-macosx_10_9_x86_64.whl (27.4 kB view details)

Uploaded CPython 3.8 macOS 10.9+ x86-64

editdistpy-0.1.3rc2-cp37-cp37m-win_amd64.whl (34.4 kB view details)

Uploaded CPython 3.7m Windows x86-64

editdistpy-0.1.3rc2-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (121.8 kB view details)

Uploaded CPython 3.7m manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

editdistpy-0.1.3rc2-cp36-cp36m-win_amd64.whl (34.3 kB view details)

Uploaded CPython 3.6m Windows x86-64

editdistpy-0.1.3rc2-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (119.6 kB view details)

Uploaded CPython 3.6m manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

File details

Details for the file editdistpy-0.1.3rc2.tar.gz.

File metadata

  • Download URL: editdistpy-0.1.3rc2.tar.gz
  • Upload date:
  • Size: 55.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for editdistpy-0.1.3rc2.tar.gz
Algorithm Hash digest
SHA256 9584883b3c043dd4112c2338d91aeaf3393e65ce6e4c59e7794cf7668c6b3897
MD5 3a797873302c6dcf70440b55bdf7869a
BLAKE2b-256 dc29ee77ba2541dadb5d4c3b49852eed6edf0f29b4d04a223c7fd7b8073812a2

See more details on using hashes here.

File details

Details for the file editdistpy-0.1.3rc2-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: editdistpy-0.1.3rc2-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 34.7 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for editdistpy-0.1.3rc2-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 6e4e8fb7679ccf7fdfc5596378f52874aea2e3b59e23bfc7ff558f4245ccf0ec
MD5 5f4d49230904d1cab277a9ee5d319d55
BLAKE2b-256 e53ed9cab46d1ea0396f92e73b607640754d02873121cbcd588c1f79fd0afdb5

See more details on using hashes here.

File details

Details for the file editdistpy-0.1.3rc2-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for editdistpy-0.1.3rc2-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 f4cb9cbce135ed5c62f91cf3d5386b053d5227a81c2c1013e8a592350fdd0aff
MD5 06cbc16626ffa6f6f10819848ea2903c
BLAKE2b-256 29f0bb818af5f8cb97e94411892e244a7203bf764349b8e663c430d802338b08

See more details on using hashes here.

File details

Details for the file editdistpy-0.1.3rc2-cp38-cp38-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: editdistpy-0.1.3rc2-cp38-cp38-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 27.4 kB
  • Tags: CPython 3.8, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for editdistpy-0.1.3rc2-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 84f54f63bcde9249e30d80bd14cd6d7b08dc543810eeefab089115bd2c2aeb64
MD5 66d34e39f606e0e4004a8be98f256f12
BLAKE2b-256 09ca5415beef8053bd9b8d416f11b21c2cee5e1ba76fc05761375229a4d0373f

See more details on using hashes here.

File details

Details for the file editdistpy-0.1.3rc2-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: editdistpy-0.1.3rc2-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 34.4 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for editdistpy-0.1.3rc2-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 fe25d61f756dcc44519ad9c3ffccb5458a1d40996663d5146c3dc7c4de2b4852
MD5 e4af086700f1e05cd9efbb02a533a305
BLAKE2b-256 649adb67bdd44b4a3759bda5e9c426c1cdf27e1706e4b3e3b34a2b9d9e21c8a3

See more details on using hashes here.

File details

Details for the file editdistpy-0.1.3rc2-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for editdistpy-0.1.3rc2-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 c3f485660fb9b7af54799ad20ee78dcc66e33c908de5ac283d87a2860a2aab81
MD5 444dece34ee54f14f1ac08a154554341
BLAKE2b-256 2276dc3367087f913a08ab79c222627b4c837ac8ef6ca526081db091dfe8ece7

See more details on using hashes here.

File details

Details for the file editdistpy-0.1.3rc2-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: editdistpy-0.1.3rc2-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 34.3 kB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for editdistpy-0.1.3rc2-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 5b61adb8bad6de20c532fa221116541dd700d381f6ef14f9f2a45a83ab204a27
MD5 d75eca9d7e8c22908a26b8ee448adb3e
BLAKE2b-256 2547fb06c2af997bc011286f9c48b30e9f19f7e455de603d05cd0c7f1b5c51a9

See more details on using hashes here.

File details

Details for the file editdistpy-0.1.3rc2-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for editdistpy-0.1.3rc2-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 1aeffcbfbf5edce8f73d5b56471acf927876e789b388dcaf933d38079b361c65
MD5 e15ca8fc4b9f237cb0c3a6ef1b7633cd
BLAKE2b-256 7977745893e16823cd039c2a6cbcf3aee20762a974315cd04de5ebe634ed2e7c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page