Algorithm for computing edit distance
This project has been archived.
The maintainers of this project have marked this project as archived. No new releases are expected.
Project description
editdistancek
This library calculates the Levenshtein edit distance between two strings. The edit distance represents the minimum number of edits required to transform one string into the other.
Our method is inspired by the algorithms of Ukkonen and Landau, Myers, and Schmidt. These strategies offer efficient ways of calculating the edit distance.
The algorithm operates with a running time of O(d*min(n,m)) and O(d^2 + n) in expectation. Here, 'd' stands for the edit distance between the two strings, while 'n' and 'm' denote the lengths of these strings, respectively. This efficient running time ensures the algorithm remains performant even for larger strings.
Motivation
The primary objective of this library is to devise a quick solution for the bounded version of the edit distance problem. Specifically, it aims to handle scenarios where, given a threshold 'k', it returns the edit distance 'd' if 'd' is less than or equal to 'k'. Otherwise, it indicates that the distance exceeds 'k'. This approach addresses the typical shortcoming of the classic edit distance algorithm, which tends to be inefficient in these cases.
The 'bounded' aspect of this library makes it particularly useful in scenarios where there's a predefined threshold of acceptable dissimilarity. This might be beneficial in time-sensitive applications or when working with very large strings, where calculating the full Levenshtein distance would be computationally prohibitive.
P.S. The 'k' in editdistancek refers to the upper bound for the algorithm. In practical terms, this means that the algorithm only computes the Levenshtein edit distance up to this 'k' value.
Installation
pip install editdistancek-rs
License
MIT License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file editdistancek_rs-1.0.0.tar.gz.
File metadata
- Download URL: editdistancek_rs-1.0.0.tar.gz
- Upload date:
- Size: 17.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ca63927e2331c2b4cdb510ef51821e8fe6d3b54ed12d51e3bfad69b040a943b1
|
|
| MD5 |
fdc6e0014d05b101c4beb09a49bc970f
|
|
| BLAKE2b-256 |
d237efb811b0085c67a471b50862bb6c059ec5cc1c95e5551d1fccf8eb00fc20
|
File details
Details for the file editdistancek_rs-1.0.0-cp312-abi3-manylinux_2_28_x86_64.whl.
File metadata
- Download URL: editdistancek_rs-1.0.0-cp312-abi3-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 177.7 kB
- Tags: CPython 3.12+, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3e6c2c4229ce7e274861a12a45a1968736759afa2110ce5b1d86e2e810e108cc
|
|
| MD5 |
748bc894b7d681a7ceab5da993ec2de1
|
|
| BLAKE2b-256 |
1df9ac65454748e7fef42defdc1389784c9360a5fae07a1e133d1fb47e43a166
|