Distance·PyPI

Levenshtein and Hamming distance computation

These details have not been verified by PyPI

Project links

Homepage

Intended Audience
- Developers
License
- OSI Approved :: GNU General Public License v2 (GPLv2)
Programming Language
- C
- Python
- Python :: 3.3

Project description

This package provides facilities for computing Levenshtein and Hamming distance between arbitrary Python objects. It is only available for Python 3.3+.

Installation

This is a C extension, so you need a C compiler available on your computer: typically Microsoft Visual C++ 2010 on Windows, and GCC on Mac and Linux. Python development files are also necessary to compile the package. On a Debian-like system, you can get all of these with:

$ apt-get install gcc python3.3-dev

Then you can do:

$ python3.3 setup.py install

Usage

Fist import the module:

>>> import distance

Two functions are provided: levenshtein and hamming. They both take two arguments, which are the objects to compare. Those objects can be of any type, as long as they support the sequence protocol: unicode strings, byte strings, lists, and tuples are ok. In case the objects provided are lists or tuples, they also should contain comparable objects.

Typical use case is to compare single words for similarity, as in spelling correction softwares:

>>> distance.levenshtein("lenvestein", "levenshtein")
3
>>> distance.hamming("hamming", "hamning")
1

Comparing lists of strings can also be useful for computing similarities between sentences, paragraphs, etc., in articles or books, as for plagiarism recognition:

>>> sent1 = ['the', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']
>>> sent2 = ['the', 'lazy', 'fox', 'jumps', 'over', 'the', 'crazy', 'dog']
>>> distance.levenshtein(sent1, sent2)
3

The above of course also works with numbers, etc.:

>>> distance.levenshtein([1,2,3], [1,3,2])
2

Implementation details

Unicode strings are handled separately from the other sequence objects, in an efficient manner. Computing similarities between lists, tuples, and byte strings is likely to be slower, in particular for byte objects, which are internally converted to tuples.

Project details

These details have not been verified by PyPI

Project links

Homepage

Intended Audience
- Developers
License
- OSI Approved :: GNU General Public License v2 (GPLv2)
Programming Language
- C
- Python
- Python :: 3.3

Release history Release notifications | RSS feed

0.1.3

Nov 21, 2013

0.1.2.5

Nov 12, 2013

0.1.2

Nov 10, 2013

0.1.1

Nov 6, 2013

This version

0.1

Nov 3, 2013

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

distance.tar.gz (34.1 kB view details)

Uploaded Nov 3, 2013 Source

File details

Details for the file distance.tar.gz.

File metadata

Download URL: distance.tar.gz
Upload date: Nov 3, 2013
Size: 34.1 kB
Tags: Source
Uploaded using Trusted Publishing? No

File hashes

Hashes for distance.tar.gz
Algorithm	Hash digest
SHA256	`5b26973dc040064f8b48ff29a4d82035076bc91056ebb1b0f446872952c42e9d`
MD5	`9d101eca8ec50f7d456ad4124baa981e`
BLAKE2b-256	`052e5dd635d1ba751fa46e10e57c9fe767cae134ff85c17e9dfbfd91bdf9ee65`

See more details on using hashes here.

Distance 0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Installation

Usage

Implementation details

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes