Skip to main content

Fast Levenshtein and Damerau optimal string alignment algorithms.

Project description

editdistpy
PyPI version Tests

editdistpy is a fast implementation of the Levenshtein edit distance and the Damerau-Levenshtein optimal string alignment (OSA) edit distance algorithms. The original C# project can be found at SoftWx.Match.

Installation


The easiest way to install editdistpy is using pip:

pip install -U editdistpy

Usage


You can specify the max_distance you care about, if the edit distance exceeds this max_distance, -1 will be returned. Specifying a sensible max distance can result in significant speed improvement.

You can also specify max_distance=sys.maxsize if you wish for the actual edit distance to always be computed.

Levenshtein

import sys

from editdistpy import levenshtein

string_1 = "flintstone"
string_2 = "hanson"

max_distance = 2
print(levenshtein.distance(string_1, string_2, max_distance))
# expected output: -1

max_distance = sys.maxsize
print(levenshtein.distance(string_1, string_2, max_distance))
# expected output: 6

Damerau-Levenshtein OSA

import sys

from editdistpy import damerau_osa

string_1 = "flintstone"
string_2 = "hanson"

max_distance = 2
print(damerau_osa.distance(string_1, string_2, max_distance))
# expected output: -1

max_distance = sys.maxsize
print(damerau_osa.distance(string_1, string_2, max_distance))
# expected output: 6

Benchmark


A simple benchmark was done on Python 3.8.12 against editdistance which implements the Levenshtein edit distance algorithm.

The script used by the benchmark can be found here.

For clarity, the following string pairs were used.

Short string

"short sentence with words"

"shrtsen tence wit mispeledwords"

Long string

"Lorem ipsum dolor sit amet consectetur adipiscing elit sed do eiusmod rem"

"Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium"

short string
        test_damerau_osa               0.925678600000083
        test_levenshtein               0.6640075999998771
        test_editdistance              0.9197039000000586
        test_damerau_osa_early_cutoff  0.7028707999998005
        test_levenshtein_early_cutoff  0.5697816000001694
long string
        test_damerau_osa               7.7526998000003005
        test_levenshtein               4.262871200000063
        test_editdistance              1.9676684999999452
        test_damerau_osa_early_cutoff  0.9891195999998672
        test_levenshtein_early_cutoff  0.9085431999997127

While max_distance=10 significantly improves the computation time, it may not be a sensible value in some cases.

editdistpy is also seen to perform better with shorter length strings and can be the more suitable library if your use case mainly deals with comparing short strings.

Changelog


See the changelog for a history of notable changes to edistdistpy.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

editdistpy-0.1.2rc1.tar.gz (55.5 kB view details)

Uploaded Source

Built Distributions

editdistpy-0.1.2rc1-cp38-cp38-win_amd64.whl (34.7 kB view details)

Uploaded CPython 3.8 Windows x86-64

editdistpy-0.1.2rc1-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (123.3 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

editdistpy-0.1.2rc1-cp38-cp38-macosx_10_9_x86_64.whl (27.4 kB view details)

Uploaded CPython 3.8 macOS 10.9+ x86-64

editdistpy-0.1.2rc1-cp37-cp37m-win_amd64.whl (34.4 kB view details)

Uploaded CPython 3.7m Windows x86-64

editdistpy-0.1.2rc1-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (121.8 kB view details)

Uploaded CPython 3.7m manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

editdistpy-0.1.2rc1-cp36-cp36m-win_amd64.whl (34.3 kB view details)

Uploaded CPython 3.6m Windows x86-64

editdistpy-0.1.2rc1-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (119.6 kB view details)

Uploaded CPython 3.6m manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

File details

Details for the file editdistpy-0.1.2rc1.tar.gz.

File metadata

  • Download URL: editdistpy-0.1.2rc1.tar.gz
  • Upload date:
  • Size: 55.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for editdistpy-0.1.2rc1.tar.gz
Algorithm Hash digest
SHA256 0d8f0bb16fe8611ba4933811bd41443846f0e6efff0d6572843cd36b31e0743f
MD5 45805d4c9505e7df1bbfad1f26e9977b
BLAKE2b-256 0d3ab3b5e48382c8b956acf3e5507e506f669f646e5ed6e098d6c6c9abc4757f

See more details on using hashes here.

File details

Details for the file editdistpy-0.1.2rc1-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: editdistpy-0.1.2rc1-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 34.7 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for editdistpy-0.1.2rc1-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 ddcd9fc18c8150331fb3abab7e2b4c7e5681814db4ea402d683da94723c05e07
MD5 f0e2b4d0425b5dea40e36acfc026372b
BLAKE2b-256 cdd58267c9d2b7b314d29071527be14ee880b3f6ccef224c827f71a96687c0b8

See more details on using hashes here.

File details

Details for the file editdistpy-0.1.2rc1-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for editdistpy-0.1.2rc1-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 1efbdbe04c2876585c3692dffa0e4b3bf91eaa83a25bb553541eadded4236b65
MD5 1029f8c21babbc52063426178b5e4190
BLAKE2b-256 6fd0ca434d6ec981c62ed41e9d1e6d4875512c749ac71c0ef62148cd18fc32a1

See more details on using hashes here.

File details

Details for the file editdistpy-0.1.2rc1-cp38-cp38-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: editdistpy-0.1.2rc1-cp38-cp38-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 27.4 kB
  • Tags: CPython 3.8, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for editdistpy-0.1.2rc1-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 faf97e52cfd2602e4c5fe017df842131a77394e0a297a7ef787d3057eeafc6ff
MD5 2c681688c7ac37625785eb779b4786a4
BLAKE2b-256 c51e92083a4dc37877c5e4989010880fdcdd9f6711c83540fa4173ae45a11fef

See more details on using hashes here.

File details

Details for the file editdistpy-0.1.2rc1-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: editdistpy-0.1.2rc1-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 34.4 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for editdistpy-0.1.2rc1-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 73692ca4ce3420e61a4f118f62e3cc5546b1663cad72e016d2e5e307d6062c02
MD5 d717d977ff846c973be8f6297d1302cf
BLAKE2b-256 10532632cd2169af43f11de2a19192e185c1e0b78b0960ae79289ee1ed622fec

See more details on using hashes here.

File details

Details for the file editdistpy-0.1.2rc1-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for editdistpy-0.1.2rc1-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 f0b647c4d8fa80de529e3e4676d13e621892e39010800ab316e57730105d5153
MD5 1a63472fdca8f75b876d2423163df989
BLAKE2b-256 e391aa6b15bcef817f0509969b226dd5fbe3ce798d7d44c5928200b245ef01b9

See more details on using hashes here.

File details

Details for the file editdistpy-0.1.2rc1-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: editdistpy-0.1.2rc1-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 34.3 kB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for editdistpy-0.1.2rc1-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 22489f2bbe9fa315b6ef3536a268fda5b9c676825fec323fb78092745784b044
MD5 cc66dfc9261a52a222a9c2640966c4f2
BLAKE2b-256 ca958b2d29372afe4b7f9533ca392ba0c4f25782948cab221a440f4c3f2cf21e

See more details on using hashes here.

File details

Details for the file editdistpy-0.1.2rc1-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for editdistpy-0.1.2rc1-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 b65c91da9ad5487dbe61e5743aba4168f50e81d568b7c385b371fd777184f5d9
MD5 9c1677ae92a9466a93de0012713141aa
BLAKE2b-256 872eb3ea28e62d8c32a28616074bf8c2bae9e1ec8466b8bac9e8e7d162da7af2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page