Minimum Edit Distance
Project description
Minimum Edit Distance in Cython
This provides String Distance functions in Cython.
Edit Based
With these metrics smaller is better.
levenshtein
(1 for insert, 1 for delete, and 1 for substitution)levenshtein_no_sub
(1 for insert, 1 for delete, 2 for substitution)brew
(0.1 for insert, 15 for delete, and 1 for substitution)dameran_levenshtein
(1 for insert, 1 for delete, 1 for substitution, 1 for transposition)
Token Based
cosine_distance
binary_cosine_distance
jaccard_distance
Sequence Based
With these metrics Larger is better.
longest_common_subsequence
longest_common_substring
Ratcliff-Obershelft
Extending and rolling your own cost functions
There are 2 kinds of functions used to define costs for the dynamic programming minimum edit distance algorithm. The first is ctypedef int (*cmp_func)(int c1, int c2)
which is used to compare two characters and return a cost. The second is ctypedef int (*char_func)(int c1, int c2)
. By implementing your own versions of these functions (I would recommned doing it in cost.pxd and inline'ing the function) you can pass them to the distance solver to implement your own weighting scheme. The cmp_func
can be used to weight a substitution (for example a low cost to letter next to each other on the keyboard like w
and e
and high cost to far keys like z
and p
). The char_func
can can be used to weight the insert or delete, for example you could weight inserts by their scabble scores.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file string_distance-1.0.0.tar.gz
.
File metadata
- Download URL: string_distance-1.0.0.tar.gz
- Upload date:
- Size: 495.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e001d7ef9f643416ba3b060931504cbf59bc3e19da9dbfd454337dd3db966fd2 |
|
MD5 | 666ab67b81f55415031acf9497702c6a |
|
BLAKE2b-256 | e2c18241afb306a606aeee00a934cc77f24e9b20ec1842023289dbb54c83c7b8 |