Skip to main content

Fast Levenshtein and Damerau optimal string alignment algorithms.

Project description

editdistpy
PyPI version Tests

editdistpy is a fast implementation of the Levenshtein edit distance and the Damerau-Levenshtein optimal string alignment (OSA) edit distance algorithms. The original C# project can be found at SoftWx.Match.

Installation


The easiest way to install editdistpy is using pip:

pip install -U editdistpy

Usage


You can specify the max_distance you care about, if the edit distance exceeds this max_distance, -1 will be returned. Specifying a sensible max distance can result in significant speed improvement.

You can also specify max_distance=sys.maxsize if you wish for the actual edit distance to always be computed.

Levenshtein

import sys

from editdistpy import levenshtein

string_1 = "flintstone"
string_2 = "hanson"

max_distance = 2
print(levenshtein.distance(string_1, string_2, max_distance))
# expected output: -1

max_distance = sys.maxsize
print(levenshtein.distance(string_1, string_2, max_distance))
# expected output: 6

Damerau-Levenshtein OSA

import sys

from editdistpy import damerau_osa

string_1 = "flintstone"
string_2 = "hanson"

max_distance = 2
print(damerau_osa.distance(string_1, string_2, max_distance))
# expected output: -1

max_distance = sys.maxsize
print(damerau_osa.distance(string_1, string_2, max_distance))
# expected output: 6

Benchmark


A simple benchmark was done on Python 3.8.12 against editdistance which implements the Levenshtein edit distance algorithm.

The script used by the benchmark can be found here.

For clarity, the following string pairs were used.

Short string

"short sentence with words"

"shrtsen tence wit mispeledwords"

Long string

"Lorem ipsum dolor sit amet consectetur adipiscing elit sed do eiusmod rem"

"Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium"

short string
        test_damerau_osa               0.925678600000083
        test_levenshtein               0.6640075999998771
        test_editdistance              0.9197039000000586
        test_damerau_osa_early_cutoff  0.7028707999998005
        test_levenshtein_early_cutoff  0.5697816000001694
long string
        test_damerau_osa               7.7526998000003005
        test_levenshtein               4.262871200000063
        test_editdistance              1.9676684999999452
        test_damerau_osa_early_cutoff  0.9891195999998672
        test_levenshtein_early_cutoff  0.9085431999997127

While max_distance=10 significantly improves the computation time, it may not be a sensible value in some cases.

editdistpy is also seen to perform better with shorter length strings and can be the more suitable library if your use case mainly deals with comparing short strings.

Changelog


See the changelog for a history of notable changes to edistdistpy.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

editdistpy-0.1.3rc3.tar.gz (55.8 kB view details)

Uploaded Source

Built Distributions

editdistpy-0.1.3rc3-cp38-cp38-win_amd64.whl (34.7 kB view details)

Uploaded CPython 3.8 Windows x86-64

editdistpy-0.1.3rc3-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (123.3 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

editdistpy-0.1.3rc3-cp38-cp38-macosx_10_9_x86_64.whl (27.4 kB view details)

Uploaded CPython 3.8 macOS 10.9+ x86-64

editdistpy-0.1.3rc3-cp37-cp37m-win_amd64.whl (34.4 kB view details)

Uploaded CPython 3.7m Windows x86-64

editdistpy-0.1.3rc3-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (121.8 kB view details)

Uploaded CPython 3.7m manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

editdistpy-0.1.3rc3-cp36-cp36m-win_amd64.whl (34.3 kB view details)

Uploaded CPython 3.6m Windows x86-64

editdistpy-0.1.3rc3-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (119.6 kB view details)

Uploaded CPython 3.6m manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

File details

Details for the file editdistpy-0.1.3rc3.tar.gz.

File metadata

  • Download URL: editdistpy-0.1.3rc3.tar.gz
  • Upload date:
  • Size: 55.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for editdistpy-0.1.3rc3.tar.gz
Algorithm Hash digest
SHA256 2625e217b6e3a8057489497e125ef80b827eea60a30b4045727da91c68e82a1b
MD5 e7b8cb661514d518f77a843cb35d20fa
BLAKE2b-256 f5394cb40ce93ecf97f24726cc843fd3c7c3ae70475d6278b45358419768e495

See more details on using hashes here.

File details

Details for the file editdistpy-0.1.3rc3-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: editdistpy-0.1.3rc3-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 34.7 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for editdistpy-0.1.3rc3-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 8b4e670143969a40496bd10f8e38cf9ba24308f861b67a9f489aa32ca022592f
MD5 b4347843c2ebe89e7e77e380783c8f2b
BLAKE2b-256 6b7e832e39ab219c14e685fcd2d7a36b77a48afb3a3feedc6476b9369d1d6ef3

See more details on using hashes here.

File details

Details for the file editdistpy-0.1.3rc3-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for editdistpy-0.1.3rc3-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 eb5bb7db92d152edbca4405c1457ea3ea1f7e44aceb7a0c0c834a0504ba901cf
MD5 c1c4b7897e5101b8c47d867e448acca8
BLAKE2b-256 36284bdc4636aacb39716da1c3b8c4014b11212e21b6c3cf2ee190451d0645fa

See more details on using hashes here.

File details

Details for the file editdistpy-0.1.3rc3-cp38-cp38-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: editdistpy-0.1.3rc3-cp38-cp38-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 27.4 kB
  • Tags: CPython 3.8, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for editdistpy-0.1.3rc3-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 15309705c52666debc8de697435bb76de91163bb991d559142870b059d178825
MD5 973ac4994bce4458f2bc370976aa84e9
BLAKE2b-256 7cb56fc290227848a950a89384579993a4ed8c69102c9754533ede51d8d86af5

See more details on using hashes here.

File details

Details for the file editdistpy-0.1.3rc3-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: editdistpy-0.1.3rc3-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 34.4 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for editdistpy-0.1.3rc3-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 6c552be7cddebeac61af7b39e5593b31fe46241d62f9acf91b9b9c334757aaf9
MD5 4a0450e33ab14f303e8d7fb4ba1479c9
BLAKE2b-256 fbeeb5ccb2c9275df1808f1908e63e9c552b876fa5f7e87ce3cea90fa323351f

See more details on using hashes here.

File details

Details for the file editdistpy-0.1.3rc3-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for editdistpy-0.1.3rc3-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 df028d1960b98bc51a54ca01ae772a6b11be749150dfff064a127e7681391790
MD5 159bd600d4c874e17af2eaa6ea9639b3
BLAKE2b-256 1e9d74a32f2ab469ee400053145eb3ea8826f44050192cf82caef46d53a14bc6

See more details on using hashes here.

File details

Details for the file editdistpy-0.1.3rc3-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: editdistpy-0.1.3rc3-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 34.3 kB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for editdistpy-0.1.3rc3-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 14006905bd5ff129764e14926647618dea144c170af40cebe3d451463303a02a
MD5 83d2b777557cbd99ee5c7cc2b72f0c2b
BLAKE2b-256 cd8f87f859f00dc9899549e3bb8cf213a620b76e87721ade78947398cde9b4d3

See more details on using hashes here.

File details

Details for the file editdistpy-0.1.3rc3-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for editdistpy-0.1.3rc3-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 495405c21a3250049d1ef1177da5163167f84a8871c9b2f8a59073e3dad24b1f
MD5 57570af97baeef22909bd2671c07833f
BLAKE2b-256 e4cf3c2345256afc7bf71f202e08b33a06aba70b6c3d27fe31734b7ce3013d7b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page