Document fingerprint generator
Project description
What is it?
“fingerprint” module generates fingerprints of a document.
Fingerprint!!!! :(, What is that?
Fingerprint is like the signature of the document. More specifically, in our context, it is the subset of hash values calculated from the document.
Okay i now know about it a bit, tell me how it calculates fingerprints?
Generation of fingerprints of a document is a three stage process;
(1st phase) generates the k-grams from the standard string
(2nd phase) generates the hash values for each k-gram using rolling hash function
(3rd phase) generates the fingerprints from the hash values using winnowing
How can i install it on my machine?
You can install it in basically two ways;
- using source
git clone git@github.com:kailashbuki/fingerprint.git
cd fingerprint
sudo python setup.py install
- using pip
sudo pip install fingerprint
Hmm! … How can i use it?
It’s plain simple. Here’s an example for you;
from fingerprint.fingerprintgenerator import file_content_refiner, FingerprintGenerator # You could get the standard string from a document as; s = file_content_refiner("path/to/file") # OR you could directly pass the standard string if you have s = "some sample string" fpg = FingerprintGenerator(input_string=s) fpg.generate_fingerprints() print fpg.fingerprints >>Feel free to contact at kailash<DOT>buki<AT>gmail<DOT>com (kailash.buki@gmail.com)<<
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for fingerprint-0.1.0.macosx-10.7-intel.exe
Algorithm | Hash digest | |
---|---|---|
SHA256 | e0b63170cb789b8a4d73424baacbf23ae4f04e38fb594508c1da4ada65f5bec0 |
|
MD5 | e784d8189388d957290a831b3b78de50 |
|
BLAKE2b-256 | d39a0d2e477b5c6cea488bff9bfd4bdbd69e8f2840eb31073c43352b7595bb02 |