Create consistent and comparable fingerprints (checksums/hashes) from unordered JSON data
Project description
json-fingerprint
Create consistent and comparable fingerprints (checksums/hashes) from unordered JSON data.
A json fingerprint consists of three parts: the version of the underlying algorithm, the hash function used and a hex digest of the hash function output. A complete example could look like this: jfpv1$sha256$5815eb0ce6f4e5ab0a771cce2a8c5432f64222f8fd84b4cc2d38e4621fae86af
.
The first part indicates the algorithm version, jfpv1
, which would translate to json fingerprint version 1. The second part, sha256
, indicates that SHA256 is the hash function that was used. The last part, 5815eb0ce6f4e5ab0a771cce2a8c5432f64222f8fd84b4cc2d38e4621fae86af
, is a standard hex digest of the hash function output.
Installation
To install the json-fingerprint package, run pip install json-fingerprint
.
Examples
The example below shows how to create and compare json fingerprints.
import json
import json_fingerprint as jfp
obj_1_str = json.dumps([3, 2, 1, {'foo': 'bar'}])
obj_2_str = json.dumps([2, {'foo': 'bar'}, 1, 3]) # Same data in different order
fp_1 = jfp.json_fingerprint(input=obj_1_str, hash_function='sha256', version=1)
fp_2 = jfp.json_fingerprint(input=obj_2_str, hash_function='sha256', version=1)
print(f'Fingerprint 1: {fp_1}')
print(f'Fingerprint 2: {fp_2}')
This will output two identical fingerprints regardless of the different order of the json elements:
Fingerprint 1: jfpv1$sha256$5815eb0ce6f4e5ab0a771cce2a8c5432f64222f8fd84b4cc2d38e4621fae86af
Fingerprint 2: jfpv1$sha256$5815eb0ce6f4e5ab0a771cce2a8c5432f64222f8fd84b4cc2d38e4621fae86af
If the previous example was extended a bit, the objects could be easily compared:
if fp_1 == fp_2:
# Do something if fingerprints match
print('Fingerprints match')
else:
# Do nothing or something else
print('Fingerprints not matching')
Since json objects with identical data content and structure will always produce identical fingerprints, the fingerprints can be used effectively for various purposes. These include finding duplicate json data from a larger dataset, json data cache validation/invalidation and data integrity checking.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for json_fingerprint-0.2.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b6c1be1636ccf309826370f74321fa3ec4f543aff7779d3dca1461743bc01dde |
|
MD5 | 0f62e8575f0790c2a33bea8fa746acd6 |
|
BLAKE2b-256 | 90eba81a804bfbc2ea8d16b84249f29b9e3509b7bb71b0fee4876d17fa81b40e |