Skip to main content

Minimum Edit Distance

Project description

Minimum Edit Distance in Cython

Build Status

This provides String Distance functions in Cython.

Edit Based

With these metrics smaller is better.

  • levenshtein (1 for insert, 1 for delete, and 1 for substitution)
  • levenshtein_no_sub (1 for insert, 1 for delete, 2 for substitution)
  • brew (0.1 for insert, 15 for delete, and 1 for substitution)
  • dameran_levenshtein (1 for insert, 1 for delete, 1 for substitution, 1 for transposition)

Token Based

  • cosine_distance
  • binary_cosine_distance
  • jaccard_distance

Sequence Based

With these metrics Larger is better.

  • longest_common_subsequence
  • longest_common_substring
  • Ratcliff-Obershelft

Extending and rolling your own cost functions

There are 2 kinds of functions used to define costs for the dynamic programming minimum edit distance algorithm. The first is ctypedef int (*cmp_func)(int c1, int c2) which is used to compare two characters and return a cost. The second is ctypedef int (*char_func)(int c1, int c2). By implementing your own versions of these functions (I would recommned doing it in cost.pxd and inline'ing the function) you can pass them to the distance solver to implement your own weighting scheme. The cmp_func can be used to weight a substitution (for example a low cost to letter next to each other on the keyboard like w and e and high cost to far keys like z and p). The char_func can can be used to weight the insert or delete, for example you could weight inserts by their scabble scores.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

string_distance-1.0.0.tar.gz (495.3 kB view details)

Uploaded Source

File details

Details for the file string_distance-1.0.0.tar.gz.

File metadata

File hashes

Hashes for string_distance-1.0.0.tar.gz
Algorithm Hash digest
SHA256 e001d7ef9f643416ba3b060931504cbf59bc3e19da9dbfd454337dd3db966fd2
MD5 666ab67b81f55415031acf9497702c6a
BLAKE2b-256 e2c18241afb306a606aeee00a934cc77f24e9b20ec1842023289dbb54c83c7b8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page