Skip to main content

Implements the Damerau–Levenshtein diff algorithm

Project description

Author:

Daniel Walker

Version:
0.1.2
Date:
2023-01-05

Overview

The dam_lev package implements the Damerau–Levenshtein diff algorithm. That is, it will take two sequences and determine the minimum number of transpositions, substitutions, insertions, and deletions needed to transform the first sequence into the second.

Usage

The package exposes a single function, dam_lev.get_changes. It takes two sequences (i.e., they must implement the __len__ and __getitem__ methods) and returns a list of dam_lev.Mutation objects. There are four subclasses of dam_lev.Mutation corresponding to the four types of transformations. For example,

diffs = dam_lev.get_changes('abcdef', 'bcedxy')
print(diffs) # [Deletion(at=0), Transposition(at=3), Substitution(at=5, at2=4), Insertion(at=6, at2=5)]

We see that the sequence of transformations is:

  • Delete the item at index 0 ('a')

  • Transpose the item at index 3 ('d') with its successor

  • Substitute the item at index 5 ('f') with the item from the second sequence at index 4 ('x')

  • Insert at index 6 the item from the second sequence at index 5 ('y')

Note the index for the transposition. Even though, after the deletion, the 'd' is at index 2, it’s at index 3 in the original version of the sequence. Likewise for the successive mutations.

Key function

You can also pass a callable as the key keyword argument to dam_lev.get_changes. Similar to list.sort, this callable will be used to compare the elements of the sequences. For example,

diffs = dam_lev.get_changes('aBc', 'AbC', key=str.upper)
print(diffs) # []

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dam_lev-0.1.2.tar.gz (3.6 kB view hashes)

Uploaded Source

Built Distribution

dam_lev-0.1.2-py3-none-any.whl (3.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page