Skip to main content

Document fingerprint generator

Project description

  • What is it?

    “fingerprint” module generates fingerprints of a document.

  • Fingerprint!!!! :(, What is that?

    Fingerprint is like the signature of the document. More specifically, in our context, it is the subset of hash values calculated from the document.

  • Okay i now know about it a bit, tell me how it calculates fingerprints?

    Generation of fingerprints of a document is a three stage process;

    • (1st phase) generates the k-grams from the standard string

    • (2nd phase) generates the hash values for each k-gram using rolling hash function

    • (3rd phase) generates the fingerprints from the hash values using winnowing

  • How can i install it on my machine?

    You can install it in basically two ways;

    • using source
      1. git clone git@github.com:kailashbuki/fingerprint.git

      2. cd fingerprint

      3. sudo python setup.py install

    • using pip
      1. sudo pip install fingerprint

  • Hmm! … How can i use it?

    It’s plain simple. Here’s an example for you;

            from fingerprint.fingerprintgenerator import file_content_refiner, FingerprintGenerator

            # You could get the standard string from a document as;
            s = file_content_refiner("path/to/file")
            # OR you could directly pass the standard string if you have
            s = "some sample string"

            fpg = FingerprintGenerator(input_string=s)
            fpg.generate_fingerprints()
            print fpg.fingerprints

>>Feel free to contact at kailash<DOT>buki<AT>gmail<DOT>com (kailash.buki@gmail.com)<<

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fingerprint-0.1.0.tar.gz (4.6 kB view hashes)

Uploaded Source

Built Distribution

fingerprint-0.1.0.macosx-10.7-intel.exe (67.8 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page