Skip to main content

Fast Levenshtein and Damerau optimal string alignment algorithms.

Project description

editdistpy
Tests

editdistpy is a fast implementation of the Levenshtein edit distance and the Damerau-Levenshtein optimal string alignment (OSA) edit distance algorithms. The original C# project can be found at SoftWx.Match.

Installation


The easiest way to install editdistpy is using pip:

pip install -U editdistpy

Usage


You can specify the max_distance you care about, if the edit distance exceeds this max_distance, -1 will be returned. Specifying a sensible max distance can result in significant speed improvement.

You can also specify max_distance=sys.maxsize if you wish for the actual edit distance to always be computed.

Levenshtein

import sys

from editdistpy import levenshtein

string_1 = "flintstone"
string_2 = "hanson"

max_distance = 2
print(levenshtein.distance(string_1, string_2, max_distance))
# expected output: -1

max_distance = sys.maxsize
print(levenshtein.distance(string_1, string_2, max_distance))
# expected output: 6

Damerau-Levenshtein OSA

import sys

from editdistpy import damerau_osa

string_1 = "flintstone"
string_2 = "hanson"

max_distance = 2
print(damerau_osa.distance(string_1, string_2, max_distance))
# expected output: -1

max_distance = sys.maxsize
print(damerau_osa.distance(string_1, string_2, max_distance))
# expected output: 6

Benchmark


A simple benchmark was done on Python 3.8.12 against editdistance which implements the Levenshtein edit distance algorithm.

The script used by the benchmark can be found here.

For clarity, the following string pairs were used.

Short string

"short sentence with words"

"shrtsen tence wit mispeledwords"

Long string

"Lorem ipsum dolor sit amet consectetur adipiscing elit sed do eiusmod rem"

"Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium"

short string
        test_damerau_osa               0.925678600000083
        test_levenshtein               0.6640075999998771
        test_editdistance              0.9197039000000586
        test_damerau_osa_early_cutoff  0.7028707999998005
        test_levenshtein_early_cutoff  0.5697816000001694
long string
        test_damerau_osa               7.7526998000003005
        test_levenshtein               4.262871200000063
        test_editdistance              1.9676684999999452
        test_damerau_osa_early_cutoff  0.9891195999998672
        test_levenshtein_early_cutoff  0.9085431999997127

While max_distance=10 significantly improves the computation time, it may not be a sensible value in some cases.

editdistpy is also seen to perform better with shorter length strings and can be the more suitable library if your use case mainly deals with comparing short strings.

Changelog


See the changelog for a history of notable changes to edistdistpy.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

editdistpy-0.1.1.tar.gz (7.2 kB view details)

Uploaded Source

Built Distributions

editdistpy-0.1.1-cp38-cp38-win_amd64.whl (34.7 kB view details)

Uploaded CPython 3.8 Windows x86-64

editdistpy-0.1.1-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (123.2 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

editdistpy-0.1.1-cp38-cp38-macosx_10_9_x86_64.whl (27.3 kB view details)

Uploaded CPython 3.8 macOS 10.9+ x86-64

editdistpy-0.1.1-cp37-cp37m-win_amd64.whl (34.4 kB view details)

Uploaded CPython 3.7m Windows x86-64

editdistpy-0.1.1-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (121.7 kB view details)

Uploaded CPython 3.7m manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

editdistpy-0.1.1-cp36-cp36m-win_amd64.whl (34.2 kB view details)

Uploaded CPython 3.6m Windows x86-64

editdistpy-0.1.1-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (119.5 kB view details)

Uploaded CPython 3.6m manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

File details

Details for the file editdistpy-0.1.1.tar.gz.

File metadata

  • Download URL: editdistpy-0.1.1.tar.gz
  • Upload date:
  • Size: 7.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for editdistpy-0.1.1.tar.gz
Algorithm Hash digest
SHA256 00e77d6cebc9afa4def264c88a157f0a93b16e46d1aec5b75c892302e527fd20
MD5 994529366cfac88e313d605a363ce83c
BLAKE2b-256 7f85f0bf5bd8d94882a907c09eb44519144b16af00c0a900c30f6dc564977930

See more details on using hashes here.

Provenance

File details

Details for the file editdistpy-0.1.1-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: editdistpy-0.1.1-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 34.7 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for editdistpy-0.1.1-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 81770c02c39e3ec352905569b688d6db75ff55e9b9a1dc305df284bd395cb9bb
MD5 cf3d3c8f26de171b3c3130308f467582
BLAKE2b-256 b5e4d3d079ba2cc38975978808ff387d95fd9ad7ff3f03ef80b06252b69d7b26

See more details on using hashes here.

Provenance

File details

Details for the file editdistpy-0.1.1-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for editdistpy-0.1.1-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 e82e43603ecacdb7182774392f5d7b46e9be36b62f51a74b1dfb740e8d37337e
MD5 ffeff4470dae31f9183e94d5c8fe70db
BLAKE2b-256 2d458f9c2d785366f3575a9af7bc2d4ea85002aa7d25af5cd6e82e83ce91249e

See more details on using hashes here.

Provenance

File details

Details for the file editdistpy-0.1.1-cp38-cp38-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: editdistpy-0.1.1-cp38-cp38-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 27.3 kB
  • Tags: CPython 3.8, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for editdistpy-0.1.1-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 9e14389a5f276ca029f709a2314b91fac079f4f86540281fe4a65c55d0b5727f
MD5 db258a5e486852ba6735a3fc269229ef
BLAKE2b-256 ac409bdc1abe2f6a6267f2139c74029416b4b3327aa6f5cecb35776b722de4ff

See more details on using hashes here.

Provenance

File details

Details for the file editdistpy-0.1.1-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: editdistpy-0.1.1-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 34.4 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for editdistpy-0.1.1-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 dd040c8a2bd9ecd6f1548fbe7323102fd99558321d02541f47de6a8962babf5e
MD5 38f03910440fe82b865ccf7d5da2b670
BLAKE2b-256 a5e205cba560f53d2a15b02f101c60802a7fc3caf9f0ba63975e2b122a256266

See more details on using hashes here.

Provenance

File details

Details for the file editdistpy-0.1.1-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for editdistpy-0.1.1-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 24b28b6abc2e0f16020ae92f511be5976924b41298d109210328650790f4d567
MD5 910ad62bc73077197d7410b3716f0ae5
BLAKE2b-256 2fb71a2de75bc174e95180a5647061a142425b0eb6c78c98d7220be0e954cc4e

See more details on using hashes here.

Provenance

File details

Details for the file editdistpy-0.1.1-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: editdistpy-0.1.1-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 34.2 kB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for editdistpy-0.1.1-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 c66df2068483ab59bdbd1c8db6d00c5dbbbc72957ccac8bdec959be53e6f3c6a
MD5 cead51d469c45b90accc8a1f90337852
BLAKE2b-256 52af743de5130aec5a05eba74ac3af5fb4bb9c145fb4b52241232531c9383d77

See more details on using hashes here.

Provenance

File details

Details for the file editdistpy-0.1.1-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for editdistpy-0.1.1-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 789df43d5099e466e1fd070a642e843a90e7790abfedb1d794a8ba38e746dedf
MD5 6a7c5ed6258c70d6c8e56eb24da3f467
BLAKE2b-256 919fea705d3db724b7ffdff3bbc04e83ea8d68618eee121e84d1823a9296238f

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page