Skip to main content

Algorithm for computing edit distance

This project has been archived.

The maintainers of this project have marked this project as archived. No new releases are expected.

Project description

editdistancek

This library calculates the Levenshtein edit distance between two strings. The edit distance represents the minimum number of edits required to transform one string into the other.

Our method is inspired by the algorithms of Ukkonen and Landau, Myers, and Schmidt. These strategies offer efficient ways of calculating the edit distance.

The algorithm operates with a running time of O(d*min(n,m)) and O(d^2 + n) in expectation. Here, 'd' stands for the edit distance between the two strings, while 'n' and 'm' denote the lengths of these strings, respectively. This efficient running time ensures the algorithm remains performant even for larger strings.

Motivation

The primary objective of this library is to devise a quick solution for the bounded version of the edit distance problem. Specifically, it aims to handle scenarios where, given a threshold 'k', it returns the edit distance 'd' if 'd' is less than or equal to 'k'. Otherwise, it indicates that the distance exceeds 'k'. This approach addresses the typical shortcoming of the classic edit distance algorithm, which tends to be inefficient in these cases.

The 'bounded' aspect of this library makes it particularly useful in scenarios where there's a predefined threshold of acceptable dissimilarity. This might be beneficial in time-sensitive applications or when working with very large strings, where calculating the full Levenshtein distance would be computationally prohibitive.

P.S. The 'k' in editdistancek refers to the upper bound for the algorithm. In practical terms, this means that the algorithm only computes the Levenshtein edit distance up to this 'k' value.

Installation

pip install editdistancek-rs

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

editdistancek_rs-1.0.0.tar.gz (17.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

editdistancek_rs-1.0.0-cp312-abi3-manylinux_2_28_x86_64.whl (177.7 kB view details)

Uploaded CPython 3.12+manylinux: glibc 2.28+ x86-64

File details

Details for the file editdistancek_rs-1.0.0.tar.gz.

File metadata

  • Download URL: editdistancek_rs-1.0.0.tar.gz
  • Upload date:
  • Size: 17.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.13.1

File hashes

Hashes for editdistancek_rs-1.0.0.tar.gz
Algorithm Hash digest
SHA256 ca63927e2331c2b4cdb510ef51821e8fe6d3b54ed12d51e3bfad69b040a943b1
MD5 fdc6e0014d05b101c4beb09a49bc970f
BLAKE2b-256 d237efb811b0085c67a471b50862bb6c059ec5cc1c95e5551d1fccf8eb00fc20

See more details on using hashes here.

File details

Details for the file editdistancek_rs-1.0.0-cp312-abi3-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for editdistancek_rs-1.0.0-cp312-abi3-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 3e6c2c4229ce7e274861a12a45a1968736759afa2110ce5b1d86e2e810e108cc
MD5 748bc894b7d681a7ceab5da993ec2de1
BLAKE2b-256 1df9ac65454748e7fef42defdc1389784c9360a5fae07a1e133d1fb47e43a166

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page