Skip to main content

Fast Levenshtein and Damerau optimal string alignment algorithms.

Project description

editdistpy
PyPI version Tests

editdistpy is a fast implementation of the Levenshtein edit distance and the Damerau-Levenshtein optimal string alignment (OSA) edit distance algorithms. The original C# project can be found at SoftWx.Match.

Installation


The easiest way to install editdistpy is using pip:

pip install -U editdistpy

Usage


You can specify the max_distance you care about, if the edit distance exceeds this max_distance, -1 will be returned. Specifying a sensible max distance can result in significant speed improvement.

You can also specify max_distance=sys.maxsize if you wish for the actual edit distance to always be computed.

Levenshtein

import sys

from editdistpy import levenshtein

string_1 = "flintstone"
string_2 = "hanson"

max_distance = 2
print(levenshtein.distance(string_1, string_2, max_distance))
# expected output: -1

max_distance = sys.maxsize
print(levenshtein.distance(string_1, string_2, max_distance))
# expected output: 6

Damerau-Levenshtein OSA

import sys

from editdistpy import damerau_osa

string_1 = "flintstone"
string_2 = "hanson"

max_distance = 2
print(damerau_osa.distance(string_1, string_2, max_distance))
# expected output: -1

max_distance = sys.maxsize
print(damerau_osa.distance(string_1, string_2, max_distance))
# expected output: 6

Benchmark


A simple benchmark was done on Python 3.8.12 against editdistance which implements the Levenshtein edit distance algorithm.

The script used by the benchmark can be found here.

For clarity, the following string pairs were used.

Short string

"short sentence with words"

"shrtsen tence wit mispeledwords"

Long string

"Lorem ipsum dolor sit amet consectetur adipiscing elit sed do eiusmod rem"

"Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium"

short string
        test_damerau_osa               0.925678600000083
        test_levenshtein               0.6640075999998771
        test_editdistance              0.9197039000000586
        test_damerau_osa_early_cutoff  0.7028707999998005
        test_levenshtein_early_cutoff  0.5697816000001694
long string
        test_damerau_osa               7.7526998000003005
        test_levenshtein               4.262871200000063
        test_editdistance              1.9676684999999452
        test_damerau_osa_early_cutoff  0.9891195999998672
        test_levenshtein_early_cutoff  0.9085431999997127

While max_distance=10 significantly improves the computation time, it may not be a sensible value in some cases.

editdistpy is also seen to perform better with shorter length strings and can be the more suitable library if your use case mainly deals with comparing short strings.

Changelog


See the changelog for a history of notable changes to edistdistpy.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

editdistpy-0.1.3rc1.tar.gz (55.5 kB view details)

Uploaded Source

Built Distributions

editdistpy-0.1.3rc1-cp38-cp38-win_amd64.whl (34.7 kB view details)

Uploaded CPython 3.8 Windows x86-64

editdistpy-0.1.3rc1-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (123.3 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

editdistpy-0.1.3rc1-cp38-cp38-macosx_10_9_x86_64.whl (27.4 kB view details)

Uploaded CPython 3.8 macOS 10.9+ x86-64

editdistpy-0.1.3rc1-cp37-cp37m-win_amd64.whl (34.4 kB view details)

Uploaded CPython 3.7m Windows x86-64

editdistpy-0.1.3rc1-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (121.8 kB view details)

Uploaded CPython 3.7m manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

editdistpy-0.1.3rc1-cp36-cp36m-win_amd64.whl (34.3 kB view details)

Uploaded CPython 3.6m Windows x86-64

editdistpy-0.1.3rc1-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (119.6 kB view details)

Uploaded CPython 3.6m manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

File details

Details for the file editdistpy-0.1.3rc1.tar.gz.

File metadata

  • Download URL: editdistpy-0.1.3rc1.tar.gz
  • Upload date:
  • Size: 55.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for editdistpy-0.1.3rc1.tar.gz
Algorithm Hash digest
SHA256 cea7bb4f8c558d092c51eb750181071a7410d436a364c44de60d630f0121db75
MD5 2070bc81a678557eb97982669e03fc5b
BLAKE2b-256 57b67380038ef9c4fa4a46921209442fb08a2376fc0f0365adef6678e5b0a44e

See more details on using hashes here.

File details

Details for the file editdistpy-0.1.3rc1-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: editdistpy-0.1.3rc1-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 34.7 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for editdistpy-0.1.3rc1-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 07bfbb079ac0819224a1d4ffaac59186b411614f023c19b7f9c5793712f9c837
MD5 65d7080128953ef9fdec2306592fe151
BLAKE2b-256 5d6caf2a6994e6d602d4c80cd47a18888439b64ecd618c4dcbf58fbe12eea627

See more details on using hashes here.

File details

Details for the file editdistpy-0.1.3rc1-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for editdistpy-0.1.3rc1-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 c0bad2bc96283bb63ef5d9763ad9256362f3d60f98a0163b763dacb69c0526e7
MD5 0feeba40e6c65bdd8a9b366cc0e0bddb
BLAKE2b-256 2bd072995a7885f4b8aec70e806dfaaf9e7641ff862447318c0381e7b3b2a2dc

See more details on using hashes here.

File details

Details for the file editdistpy-0.1.3rc1-cp38-cp38-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: editdistpy-0.1.3rc1-cp38-cp38-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 27.4 kB
  • Tags: CPython 3.8, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for editdistpy-0.1.3rc1-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 1e76bac1893c22112e9d7de0d691e068a7b0c719dd1830281ef23e9b7cf9a82a
MD5 acecd638901162d6bcd6fc2c5a9eb796
BLAKE2b-256 db27d272b9a64eef0c4f14ba45090e1518dfad416cd46658a2527f241c6f0474

See more details on using hashes here.

File details

Details for the file editdistpy-0.1.3rc1-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: editdistpy-0.1.3rc1-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 34.4 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for editdistpy-0.1.3rc1-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 203ece599b1c26262e4732c786a4c438b06eac912360978f664cecf6c2e0b2ab
MD5 2783c80910c8210b9cc886c8c29504db
BLAKE2b-256 cb4766777a251427d8d8125c3f7e0c119b2a3a73f6c3d25cd8995d63c11dbcac

See more details on using hashes here.

File details

Details for the file editdistpy-0.1.3rc1-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for editdistpy-0.1.3rc1-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 e9840dd2b03610b0b7faabc94363910f6d999cdca0c7b24a0dcddf061157c02f
MD5 3b1e03a087413c3c7d08c4e5ac16cc52
BLAKE2b-256 c001acd7540a189e26b196a40315611bac41adcd354ef8fb9bb3d94504db11b4

See more details on using hashes here.

File details

Details for the file editdistpy-0.1.3rc1-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: editdistpy-0.1.3rc1-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 34.3 kB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for editdistpy-0.1.3rc1-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 f946c7acd07184325785fc89721b49238e0dd0c28703cbec4105107a8dbd6d9a
MD5 03f2dcd8dc957aa1f144d8de95226ecf
BLAKE2b-256 58b53dcff7bcb06f71d709c97e297a5df06f57310acd3e1dfa64ace7e563464d

See more details on using hashes here.

File details

Details for the file editdistpy-0.1.3rc1-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for editdistpy-0.1.3rc1-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 5cb7acd6930a2fad0f98a1dc0a8874e38bd638a4015a6dd2a16c38efc69750cf
MD5 50ab92459b6996504461b4e5922bbe7f
BLAKE2b-256 5a20f02da9788af46d035206aa3fc56905e3da7094552e67c72c83152df062f7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page