Skip to main content

Fast Levenshtein and Damerau optimal string alignment algorithms.

Project description

editdistpy
PyPI version Tests

editdistpy is a fast implementation of the Levenshtein edit distance and the Damerau-Levenshtein optimal string alignment (OSA) edit distance algorithms. The original C# project can be found at SoftWx.Match.

Installation


The easiest way to install editdistpy is using pip:

pip install -U editdistpy

Usage


You can specify the max_distance you care about, if the edit distance exceeds this max_distance, -1 will be returned. Specifying a sensible max distance can result in significant speed improvement.

You can also specify max_distance=sys.maxsize if you wish for the actual edit distance to always be computed.

Levenshtein

import sys

from editdistpy import levenshtein

string_1 = "flintstone"
string_2 = "hanson"

max_distance = 2
print(levenshtein.distance(string_1, string_2, max_distance))
# expected output: -1

max_distance = sys.maxsize
print(levenshtein.distance(string_1, string_2, max_distance))
# expected output: 6

Damerau-Levenshtein OSA

import sys

from editdistpy import damerau_osa

string_1 = "flintstone"
string_2 = "hanson"

max_distance = 2
print(damerau_osa.distance(string_1, string_2, max_distance))
# expected output: -1

max_distance = sys.maxsize
print(damerau_osa.distance(string_1, string_2, max_distance))
# expected output: 6

Benchmark


A simple benchmark was done on Python 3.8.12 against editdistance which implements the Levenshtein edit distance algorithm.

The script used by the benchmark can be found here.

For clarity, the following string pairs were used.

Short string

"short sentence with words"

"shrtsen tence wit mispeledwords"

Long string

"Lorem ipsum dolor sit amet consectetur adipiscing elit sed do eiusmod rem"

"Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium"

short string
        test_damerau_osa               0.925678600000083
        test_levenshtein               0.6640075999998771
        test_editdistance              0.9197039000000586
        test_damerau_osa_early_cutoff  0.7028707999998005
        test_levenshtein_early_cutoff  0.5697816000001694
long string
        test_damerau_osa               7.7526998000003005
        test_levenshtein               4.262871200000063
        test_editdistance              1.9676684999999452
        test_damerau_osa_early_cutoff  0.9891195999998672
        test_levenshtein_early_cutoff  0.9085431999997127

While max_distance=10 significantly improves the computation time, it may not be a sensible value in some cases.

editdistpy is also seen to perform better with shorter length strings and can be the more suitable library if your use case mainly deals with comparing short strings.

Changelog


See the changelog for a history of notable changes to edistdistpy.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

editdistpy-0.1.2rc3.tar.gz (55.4 kB view details)

Uploaded Source

Built Distributions

editdistpy-0.1.2rc3-cp38-cp38-win_amd64.whl (34.7 kB view details)

Uploaded CPython 3.8 Windows x86-64

editdistpy-0.1.2rc3-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (123.3 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

editdistpy-0.1.2rc3-cp38-cp38-macosx_10_9_x86_64.whl (27.4 kB view details)

Uploaded CPython 3.8 macOS 10.9+ x86-64

editdistpy-0.1.2rc3-cp37-cp37m-win_amd64.whl (34.4 kB view details)

Uploaded CPython 3.7m Windows x86-64

editdistpy-0.1.2rc3-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (121.8 kB view details)

Uploaded CPython 3.7m manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

editdistpy-0.1.2rc3-cp36-cp36m-win_amd64.whl (34.3 kB view details)

Uploaded CPython 3.6m Windows x86-64

editdistpy-0.1.2rc3-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (119.6 kB view details)

Uploaded CPython 3.6m manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

File details

Details for the file editdistpy-0.1.2rc3.tar.gz.

File metadata

  • Download URL: editdistpy-0.1.2rc3.tar.gz
  • Upload date:
  • Size: 55.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for editdistpy-0.1.2rc3.tar.gz
Algorithm Hash digest
SHA256 3f205ba8a73ae1566b587b043479d595a02587b961319b1148bf2a838ddaeb03
MD5 ad8d022caa47657bb40be70d2899a51f
BLAKE2b-256 15cdefc0f402ced25b87ed09f7592d6cc72638634d5c42c594615e3ee821d2be

See more details on using hashes here.

File details

Details for the file editdistpy-0.1.2rc3-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: editdistpy-0.1.2rc3-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 34.7 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for editdistpy-0.1.2rc3-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 57f0bc9ba984d5880c74ffd9a26100a7811b6d061e46868c7a2ec549de92acc8
MD5 5b2e0a2258d0cc37e9cf3947024f9101
BLAKE2b-256 ef7559790ebbc33a0aac755d03ddf80f54135e8301cc17bf1e0396c6f1578657

See more details on using hashes here.

File details

Details for the file editdistpy-0.1.2rc3-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for editdistpy-0.1.2rc3-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 164de8fff35ab3f6ad8061ed401e09b1ffda56457d3f9b1bd75c1acd324b540d
MD5 7b663172a243883310efa64cdee8d610
BLAKE2b-256 e6c67c9f909955e3a285f1e763650dec157b27a9ecf5833b35fae6c2443bfbd4

See more details on using hashes here.

File details

Details for the file editdistpy-0.1.2rc3-cp38-cp38-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: editdistpy-0.1.2rc3-cp38-cp38-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 27.4 kB
  • Tags: CPython 3.8, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for editdistpy-0.1.2rc3-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 f1116f3e71c24518970c589db3f8d08adcd8e9831d8abc679cdf239aa6f629eb
MD5 1396b0a3295e50fe7f9a0a82e98c35bc
BLAKE2b-256 7de6e8c0b4a579a1bcc22a617360e75612c1247bb603f79656ad4a31d8108131

See more details on using hashes here.

File details

Details for the file editdistpy-0.1.2rc3-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: editdistpy-0.1.2rc3-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 34.4 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for editdistpy-0.1.2rc3-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 56b0f4fe068a87d03551fdb5cf8f146e909244ca49acf0cb9235a6a6ec8ffb3b
MD5 bcf4c83cb044063347f9f544861366f5
BLAKE2b-256 dba74b657ed1bde911479871c2047095867f322b9cbbbb117b14993f334362c7

See more details on using hashes here.

File details

Details for the file editdistpy-0.1.2rc3-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for editdistpy-0.1.2rc3-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 487d42ab2066f20686b0d2eef1695668074613eb4dd770542291646088369d84
MD5 ddda771642bf44897130ad69bb819d90
BLAKE2b-256 44c1d889b12b84b8a94df6e420958fc68e577ed5c88da2a45f82706916039a49

See more details on using hashes here.

File details

Details for the file editdistpy-0.1.2rc3-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: editdistpy-0.1.2rc3-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 34.3 kB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for editdistpy-0.1.2rc3-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 655cd9efdb9120b8ff47ff56f0355842046f8f00f7f8b4535607787698df4da5
MD5 05e48d9b58637091a8486290c3114a4e
BLAKE2b-256 d6eebf3b88a8b87906a53fb95b2cab105b77dc70baf31f7316f2f1ba2a34d843

See more details on using hashes here.

File details

Details for the file editdistpy-0.1.2rc3-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for editdistpy-0.1.2rc3-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 57626b1c8d56430adaf97998f42c1ad1951d0b48cf069eb6685f15060e8bd677
MD5 1d12b1c2fd8a3aaa1531a0330d0ef7b3
BLAKE2b-256 dbbfd2a07d4cc9cfb0c0be1d17cc0d12c9bd284184d482f6109cbd3d58922cee

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page