Skip to main content

Fast Levenshtein and Damerau optimal string alignment algorithms.

Project description

editdistpy
PyPI version Tests

editdistpy is a fast implementation of the Levenshtein edit distance and the Damerau-Levenshtein optimal string alignment (OSA) edit distance algorithms. The original C# project can be found at SoftWx.Match.

Installation


The easiest way to install editdistpy is using pip:

pip install -U editdistpy

Usage


You can specify the max_distance you care about, if the edit distance exceeds this max_distance, -1 will be returned. Specifying a sensible max distance can result in significant speed improvement.

You can also specify max_distance=sys.maxsize if you wish for the actual edit distance to always be computed.

Levenshtein

import sys

from editdistpy import levenshtein

string_1 = "flintstone"
string_2 = "hanson"

max_distance = 2
print(levenshtein.distance(string_1, string_2, max_distance))
# expected output: -1

max_distance = sys.maxsize
print(levenshtein.distance(string_1, string_2, max_distance))
# expected output: 6

Damerau-Levenshtein OSA

import sys

from editdistpy import damerau_osa

string_1 = "flintstone"
string_2 = "hanson"

max_distance = 2
print(damerau_osa.distance(string_1, string_2, max_distance))
# expected output: -1

max_distance = sys.maxsize
print(damerau_osa.distance(string_1, string_2, max_distance))
# expected output: 6

Benchmark


A simple benchmark was done on Python 3.8.12 against editdistance which implements the Levenshtein edit distance algorithm.

The script used by the benchmark can be found here.

For clarity, the following string pairs were used.

Short string

"short sentence with words"

"shrtsen tence wit mispeledwords"

Long string

"Lorem ipsum dolor sit amet consectetur adipiscing elit sed do eiusmod rem"

"Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium"

short string
        test_damerau_osa               0.925678600000083
        test_levenshtein               0.6640075999998771
        test_editdistance              0.9197039000000586
        test_damerau_osa_early_cutoff  0.7028707999998005
        test_levenshtein_early_cutoff  0.5697816000001694
long string
        test_damerau_osa               7.7526998000003005
        test_levenshtein               4.262871200000063
        test_editdistance              1.9676684999999452
        test_damerau_osa_early_cutoff  0.9891195999998672
        test_levenshtein_early_cutoff  0.9085431999997127

While max_distance=10 significantly improves the computation time, it may not be a sensible value in some cases.

editdistpy is also seen to perform better with shorter length strings and can be the more suitable library if your use case mainly deals with comparing short strings.

Changelog


See the changelog for a history of notable changes to edistdistpy.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

editdistpy-0.1.2rc2.tar.gz (54.7 kB view details)

Uploaded Source

Built Distributions

editdistpy-0.1.2rc2-cp38-cp38-win_amd64.whl (34.7 kB view details)

Uploaded CPython 3.8 Windows x86-64

editdistpy-0.1.2rc2-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (123.3 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

editdistpy-0.1.2rc2-cp38-cp38-macosx_10_9_x86_64.whl (27.4 kB view details)

Uploaded CPython 3.8 macOS 10.9+ x86-64

editdistpy-0.1.2rc2-cp37-cp37m-win_amd64.whl (34.4 kB view details)

Uploaded CPython 3.7m Windows x86-64

editdistpy-0.1.2rc2-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (121.8 kB view details)

Uploaded CPython 3.7m manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

editdistpy-0.1.2rc2-cp36-cp36m-win_amd64.whl (34.3 kB view details)

Uploaded CPython 3.6m Windows x86-64

editdistpy-0.1.2rc2-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (119.6 kB view details)

Uploaded CPython 3.6m manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

File details

Details for the file editdistpy-0.1.2rc2.tar.gz.

File metadata

  • Download URL: editdistpy-0.1.2rc2.tar.gz
  • Upload date:
  • Size: 54.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for editdistpy-0.1.2rc2.tar.gz
Algorithm Hash digest
SHA256 380dc649d3383a7c4366d5959916edd5b898332fd499be058fb9bfb0b8334790
MD5 c33945f67678bc03be72f04366750304
BLAKE2b-256 f9971467438d1e55cab1d6ec263637322b160e021cc2079d1dd168f6ef433f17

See more details on using hashes here.

File details

Details for the file editdistpy-0.1.2rc2-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: editdistpy-0.1.2rc2-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 34.7 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for editdistpy-0.1.2rc2-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 e1bff347adedabd6cfd15e5f4fc0f44608e84654f822edd5b800d0308032352b
MD5 079a38183a3902a371d53bae96fe28f2
BLAKE2b-256 e141a35733512088023fa5f584adc31cbd05bf87bd7fb308104d8847d89b7279

See more details on using hashes here.

File details

Details for the file editdistpy-0.1.2rc2-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for editdistpy-0.1.2rc2-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 32e43c54e89295c26e99af26237c90086fca2bf55bc5a420ecaa95738db655d6
MD5 32f7c874298768fa0acc3a8b4e519659
BLAKE2b-256 b73b423f10837a1580bae0db59dbb8156c63727fd88249caf8e4edf121dda79c

See more details on using hashes here.

File details

Details for the file editdistpy-0.1.2rc2-cp38-cp38-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: editdistpy-0.1.2rc2-cp38-cp38-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 27.4 kB
  • Tags: CPython 3.8, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for editdistpy-0.1.2rc2-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 490cadd0331d642fc8dfed1db12528e99d16144b7f6d792839f4bd37dee10a0d
MD5 259edc3b5ad5ac778033461fd00b05c4
BLAKE2b-256 3eeb55ae1d35691c261c8f48c15bca9eae59ffb9cf364dd3900fba6f0561b56f

See more details on using hashes here.

File details

Details for the file editdistpy-0.1.2rc2-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: editdistpy-0.1.2rc2-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 34.4 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for editdistpy-0.1.2rc2-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 a0c16fa8ac1a1caddeaf850da33baf85b4bf9aa554620bbd7629df9fe0b85f46
MD5 080e1e85bbf923981c1fcef2551c845f
BLAKE2b-256 31baca7267b0956abf50ed20911a9cb8d2d4a80cbf0ee7dee5fa1d907a312d20

See more details on using hashes here.

File details

Details for the file editdistpy-0.1.2rc2-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for editdistpy-0.1.2rc2-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 8aec1ce6f7251d2b3fe7b518c316429be7b763d2550a2513e737e8e5a400c15f
MD5 31043ccaff5c5b1dd8d1d20e84c80ab8
BLAKE2b-256 6513f4f26ddc0f773ca372b3d862a69d0d6dbcb1c140530b36bb3e00274d46e9

See more details on using hashes here.

File details

Details for the file editdistpy-0.1.2rc2-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: editdistpy-0.1.2rc2-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 34.3 kB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for editdistpy-0.1.2rc2-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 26cb1cecb52716c259207dfc9a40bc9873d2b857739437315eb9baa9e9f1fd11
MD5 5283f16f7268d1498b4502eed0b5dbeb
BLAKE2b-256 4c30390c4093839fc26d6a8537803bcc9bb6a27a766015b1f51ee2a5f5d7de1d

See more details on using hashes here.

File details

Details for the file editdistpy-0.1.2rc2-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for editdistpy-0.1.2rc2-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 61dfad55449f139fee6d047b745a4d857f305edf1dae08478b08c97b5c22f95d
MD5 4bf51440bf55abf750bf91d881b6834e
BLAKE2b-256 8a34b4c2bcc9c9a285ea398fae1213b20a4d27b22d4ff2d34137c433afddade8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page