Skip to main content

Fast Levenshtein and Damerau optimal string alignment algorithms.

Project description

editdistpy
PyPI version Tests

editdistpy is a fast implementation of the Levenshtein edit distance and the Damerau-Levenshtein optimal string alignment (OSA) edit distance algorithms. The original C# project can be found at SoftWx.Match.

Installation

The easiest way to install editdistpy is using pip:

pip install -U editdistpy

Usage

You can specify the max_distance you care about, if the edit distance exceeds this max_distance, -1 will be returned. Specifying a sensible max distance can result in significant speed improvement.

You can also specify max_distance=sys.maxsize if you wish for the actual edit distance to always be computed.

Levenshtein

import sys

from editdistpy import levenshtein

string_1 = "flintstone"
string_2 = "hanson"

max_distance = 2
print(levenshtein.distance(string_1, string_2, max_distance))
# expected output: -1

max_distance = sys.maxsize
print(levenshtein.distance(string_1, string_2, max_distance))
# expected output: 6

Damerau-Levenshtein OSA

import sys

from editdistpy import damerau_osa

string_1 = "flintstone"
string_2 = "hanson"

max_distance = 2
print(damerau_osa.distance(string_1, string_2, max_distance))
# expected output: -1

max_distance = sys.maxsize
print(damerau_osa.distance(string_1, string_2, max_distance))
# expected output: 6

Benchmark

A simple benchmark was done on Python 3.8.12 against editdistance which implements the Levenshtein edit distance algorithm.

The script used by the benchmark can be found here.

For clarity, the following string pairs were used.

Single word (completely different)

"xabxcdxxefxgx"
"1ab2cd34ef5g6"

Single word (similar)

"example"
"samples"

Single word (identical ending)

"kdeisfnexabxcdxlskdixefxgx"
"xabxcdxlskdixefxgx"

Short string

"short sentence with words"
"shrtsen tence wit mispeledwords"

Long string

"Lorem ipsum dolor sit amet consectetur adipiscing elit sed do eiusmod rem"
"Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium"

single_dif string
        test_damerau_osa               0.5202 usec/pass 1040.36 msec total 2000000 iterations
        test_levenshtein               0.3547 usec/pass 709.40 msec total 2000000 iterations
        test_editdistance              0.6399 usec/pass 1279.81 msec total 2000000 iterations
        test_damerau_osa early_cutoff  0.5134 usec/pass 1026.72 msec total 2000000 iterations
        test_levenshtein early_cutoff  0.3862 usec/pass 772.31 msec total 2000000 iterations
single_sim string
        test_damerau_osa               0.2983 usec/pass 596.57 msec total 2000000 iterations
        test_levenshtein               0.2433 usec/pass 486.68 msec total 2000000 iterations
        test_editdistance              0.3942 usec/pass 788.36 msec total 2000000 iterations
        test_damerau_osa early_cutoff  0.2865 usec/pass 572.90 msec total 2000000 iterations
        test_levenshtein early_cutoff  0.2363 usec/pass 472.61 msec total 2000000 iterations
single_end string
        test_damerau_osa               0.3332 usec/pass 666.32 msec total 2000000 iterations
        test_levenshtein               0.3300 usec/pass 659.93 msec total 2000000 iterations
        test_editdistance              0.7902 usec/pass 1580.42 msec total 2000000 iterations
        test_damerau_osa early_cutoff  0.3199 usec/pass 639.74 msec total 2000000 iterations
        test_levenshtein early_cutoff  0.3205 usec/pass 641.01 msec total 2000000 iterations
short string
        test_damerau_osa               0.9925 usec/pass 1984.97 msec total 2000000 iterations
        test_levenshtein               0.6379 usec/pass 1275.76 msec total 2000000 iterations
        test_editdistance              0.9587 usec/pass 1917.37 msec total 2000000 iterations
        test_damerau_osa early_cutoff  0.7535 usec/pass 1506.91 msec total 2000000 iterations
        test_levenshtein early_cutoff  0.5794 usec/pass 1158.79 msec total 2000000 iterations
long string
        test_damerau_osa               8.6244 usec/pass 17248.73 msec total 2000000 iterations
        test_levenshtein               4.2367 usec/pass 8473.36 msec total 2000000 iterations
        test_editdistance              2.0407 usec/pass 4081.31 msec total 2000000 iterations
        test_damerau_osa early_cutoff  1.0795 usec/pass 2158.99 msec total 2000000 iterations
        test_levenshtein early_cutoff  0.9031 usec/pass 1806.28 msec total 2000000 iterations

While max_distance=10 significantly improves the computation time, it may not be a sensible value in some cases.

editdistpy is also seen to perform better with shorter length strings and can be the more suitable library if your use case mainly deals with comparing short strings.

Changelog

See the changelog for a history of notable changes to edistdistpy.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

editdistpy-0.1.3.tar.gz (57.2 kB view details)

Uploaded Source

Built Distributions

editdistpy-0.1.3-cp38-cp38-win_amd64.whl (33.2 kB view details)

Uploaded CPython 3.8 Windows x86-64

editdistpy-0.1.3-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (126.9 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

editdistpy-0.1.3-cp38-cp38-macosx_10_9_x86_64.whl (27.5 kB view details)

Uploaded CPython 3.8 macOS 10.9+ x86-64

editdistpy-0.1.3-cp37-cp37m-win_amd64.whl (32.9 kB view details)

Uploaded CPython 3.7m Windows x86-64

editdistpy-0.1.3-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (125.5 kB view details)

Uploaded CPython 3.7m manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

editdistpy-0.1.3-cp36-cp36m-win_amd64.whl (32.8 kB view details)

Uploaded CPython 3.6m Windows x86-64

editdistpy-0.1.3-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (123.4 kB view details)

Uploaded CPython 3.6m manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

File details

Details for the file editdistpy-0.1.3.tar.gz.

File metadata

  • Download URL: editdistpy-0.1.3.tar.gz
  • Upload date:
  • Size: 57.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for editdistpy-0.1.3.tar.gz
Algorithm Hash digest
SHA256 b3cad07319d79fe8b3ba6bf92293de962932917d88e9c624df7b6a44bddf1dcc
MD5 6b1a5b4478f3c9f058588646cb04773b
BLAKE2b-256 c883d3192c486d81f6ccc10940c9be084682271e514c70ced94b29f4575a4c5c

See more details on using hashes here.

Provenance

File details

Details for the file editdistpy-0.1.3-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: editdistpy-0.1.3-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 33.2 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for editdistpy-0.1.3-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 7c00a540d17b998519b100a610acc1e1f2aa015b2b8dc268dcd62eeae4259916
MD5 d61f71679563e2718ec93915d8fa29b7
BLAKE2b-256 0c050894272ae1ffa579fabb41baa5469912a377db898b518cd14982071e425d

See more details on using hashes here.

Provenance

File details

Details for the file editdistpy-0.1.3-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for editdistpy-0.1.3-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 aa559312e07ff8835470d0d9b2f245aa2bc9c2d3370a6c5812508e6f0265a523
MD5 a4c9665ef6e8cad57ae6038728f51ce6
BLAKE2b-256 4dfea47ee3127cbb28f8f07d162c2a81efdd9e85695dbb894552e1020baeaf65

See more details on using hashes here.

Provenance

File details

Details for the file editdistpy-0.1.3-cp38-cp38-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: editdistpy-0.1.3-cp38-cp38-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 27.5 kB
  • Tags: CPython 3.8, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for editdistpy-0.1.3-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 28f71c6343a996776f5691da19136a558593428ad6a10baab85795fa21a461ff
MD5 2a72b65a9e1224041398276dd00267f2
BLAKE2b-256 1f3b8e7c072f6d4482a494bc304c528aca280ff35d9330217a5a645ee9e74a39

See more details on using hashes here.

Provenance

File details

Details for the file editdistpy-0.1.3-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: editdistpy-0.1.3-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 32.9 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for editdistpy-0.1.3-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 a74fdc3becf0dc115d9f34cc45dcfa0e5154f5348f2b34ccc71026b064c22689
MD5 cdf3a0ce13bbc32b8f780bab065a0b07
BLAKE2b-256 4972537626d0f5872bf184893a4f9bed9417c2a0808f847120169bf7e6081a82

See more details on using hashes here.

Provenance

File details

Details for the file editdistpy-0.1.3-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for editdistpy-0.1.3-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 b747995aebe987a565cff0e56812bf08e4e4e9ba8b7f094804845d9e53b20533
MD5 dc46395331d883f737945a60e6bbb89a
BLAKE2b-256 ec2a65dbc51b4c63c1c7cd0941ece5d3cf037e03a4a1e227a23948de424c2b59

See more details on using hashes here.

Provenance

File details

Details for the file editdistpy-0.1.3-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: editdistpy-0.1.3-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 32.8 kB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for editdistpy-0.1.3-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 4e7e1ac3c59d479f568e38729b022240cc46f5e6753ba1c023456314970bb149
MD5 b56cf8482bcf68d233bfc6977f647f4f
BLAKE2b-256 d775f72f5ecc27dda84f6d671feac692e978e3069b1bede74dc63d15beb02c78

See more details on using hashes here.

Provenance

File details

Details for the file editdistpy-0.1.3-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for editdistpy-0.1.3-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 7f951243c0f7074415849ca18d107891e98afe0f5381fb65bd39768633ab6748
MD5 8c57d41c56d89b885164d59e6e9a4d60
BLAKE2b-256 da4529541b4fc6e4670b26ed68e99e8e9f257befc2dc7c1b5a42a42367c7af87

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page