Skip to main content

Algorithm for computing edit distance

This project has been archived.

The maintainers of this project have marked this project as archived. No new releases are expected.

Project description

editdistancek

This library calculates the Levenshtein edit distance between two strings. The edit distance represents the minimum number of edits required to transform one string into the other.

Our method is inspired by the algorithms of Ukkonen and Landau, Myers, and Schmidt. These strategies offer efficient ways of calculating the edit distance.

The algorithm operates with a running time of O(d*min(n,m)) and O(d^2 + n) in expectation. Here, 'd' stands for the edit distance between the two strings, while 'n' and 'm' denote the lengths of these strings, respectively. This efficient running time ensures the algorithm remains performant even for larger strings.

Motivation

The primary objective of this library is to devise a quick solution for the bounded version of the edit distance problem. Specifically, it aims to handle scenarios where, given a threshold 'k', it returns the edit distance 'd' if 'd' is less than or equal to 'k'. Otherwise, it indicates that the distance exceeds 'k'. This approach addresses the typical shortcoming of the classic edit distance algorithm, which tends to be inefficient in these cases.

The 'bounded' aspect of this library makes it particularly useful in scenarios where there's a predefined threshold of acceptable dissimilarity. This might be beneficial in time-sensitive applications or when working with very large strings, where calculating the full Levenshtein distance would be computationally prohibitive.

P.S. The 'k' in editdistancek refers to the upper bound for the algorithm. In practical terms, this means that the algorithm only computes the Levenshtein edit distance up to this 'k' value.

Installation

TODO

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

editdistancek_rs-0.1.0.tar.gz (17.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

editdistancek_rs-0.1.0-cp312-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (177.7 kB view details)

Uploaded CPython 3.12+manylinux: glibc 2.17+ x86-64

File details

Details for the file editdistancek_rs-0.1.0.tar.gz.

File metadata

  • Download URL: editdistancek_rs-0.1.0.tar.gz
  • Upload date:
  • Size: 17.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.13.1

File hashes

Hashes for editdistancek_rs-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a29cff1ff2c685c94ac1edba72d7672a6c0dbe47d20a9be9846006d8e1f5382b
MD5 20a520e72ef72d116736567ab5a1c293
BLAKE2b-256 be10c3b5aad8bbb3639f374194c498f6261328fa80615e41d0de1e9f377f757e

See more details on using hashes here.

File details

Details for the file editdistancek_rs-0.1.0-cp312-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for editdistancek_rs-0.1.0-cp312-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 8f762cb01dc648e3e5b62a32b0c015b9fb227c0ac8812c578638a3079817ae14
MD5 2e0ae19745a53e8e3740bc8048ddd525
BLAKE2b-256 6e28de6f21a85c752f3cc93d356b936798e0971da01e8ef9f0ca511c938aa650

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page