Skip to main content
This is a pre-production deployment of Warehouse. Changes made here affect the production instance of PyPI (pypi.python.org).
Help us improve Python packaging - Donate today!

A Python implementation of the Winnowing (local algorithms for document fingerprinting)

Project Description

Winnowing

A Python implementation of the Winnowing (local algorithms for document fingerprinting)

Original Work

The original research paper can be found at http://dl.acm.org/citation.cfm?id=872770.

Usage

>>> winnow('A do run run run, a do run run')
set([(5, 23942), (14, 2887), (2, 1966), (9, 23942), (20, 1966)])

>>> winnow('run run')
set([(0, 23942)]) # match found!

Default Hash Function

Quite honestly, I did not know what hash function to use. The paper did not talk about it. So I decided to use a part of SHA-1; more precisely, the last 16 bits of the digest.

Custom Hash Function

You may use your own hash function as demonstrated below.

def hash_md5(text):
    import hashlib

    hs = hashlib.md5(text)
    hs = hs.hexdigest()
    hs = int(hs, 16)

    return hs

# Override the hash function
winnow.hash_function = hash_md5

winnow('The cake was a lie')

Lower Bound of Fingerprint Density

(TODO: Write this section)

Release History

Release History

This version
History Node

0.2.0

History Node

0.1.2

History Node

0.1.0

Supported By

WebFaction WebFaction Technical Writing Elastic Elastic Search Pingdom Pingdom Monitoring Dyn Dyn DNS Sentry Sentry Error Logging CloudAMQP CloudAMQP RabbitMQ Heroku Heroku PaaS Kabu Creative Kabu Creative UX & Design Fastly Fastly CDN DigiCert DigiCert EV Certificate Rackspace Rackspace Cloud Servers DreamHost DreamHost Log Hosting