Skip to main content

A pure, minimalist, no-dependency Python library of various edit distances.

Project description

pyeditdistance

PyPI PyPI - Downloads PyPI - License

A pure, minimalist Python library of various edit distance metrics. MIT-licensed, zero dependencies.

Implemented methods:

Levenshtein and Damerau-Levenshtein distances use the Wagner-Fischer dynamic programming algorithm [2].

Some basic unit tests can be executed using pytest

Installation

pip install pyeditdistance

Optional (user-specific): pip install --user pyeditdistance

Usage

from pyeditdistance import distance as d

s1 = "I am Joe Bloggs"
s2 = "I am John Galt"

# Levenshtein distance
res = d.levenshtein(s1, s2) # => 8

# Normalized Levenshtein
res = d.normalized_levenshtein(s1, s2) # => 0.4324...

# Damerau-Levenshtein
s3 = "abc"
s4 = "cb"
res = d.damerau_levenshtein(s3, s4) # => 2

# Hamming distance
s5 = "abcccdeeffghh zz"
s6 = "bacccdeeffhghz z"
res = d.hamming(s5, s6) # => 6

# Longest common subsequence (LCS)
s7 = "AAGGQQERqer"
s8 = "AaQERqer"
res = d.longest_common_subsequence(s7, s8) # => 7

References

  1. L. Yujian and L. Bo, "A normalized Levenshtein distance metric," IEEE Transactions on Pattern Analysis and Machine Intelligence (2007). https://ieeexplore.ieee.org/document/4160958
  2. R. Wagner and M. Fisher, "The string to string correction problem," Journal of the ACM, 21:168-178, 1974.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyeditdistance-1.0.1.tar.gz (5.8 kB view hashes)

Uploaded Source

Built Distribution

pyeditdistance-1.0.1-py3-none-any.whl (4.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page