Command-line tool for hashing RDF definitions into resolvable identifiers. (Default: sha256)
Project description
RDF Hash
Command-line tool for hashing RDF definitions into resolvable identifiers ( sha256, md5, blake2b, etc. ).
Selected subjects are replaced with hash of their triples (Default: blank node subjects).
Set of triples on a given subject are sorted by {predicate} {object}.\n, then hashed together. The hash result replaces the subject URI (Ex: <md5:fdd61ec7cdbc7241f0289339678dd008>).
Setup
Dependencies
Getting Started
-
Install
pippackagespython3.10 -m pip install rdfhash
-
Test script
rdfhash --data="[ a <def:class:Person> ] ." --method=sha1
<sha1:f0392681a6a701d9672925133bf1207f4be9e412> a <def:class:Person> .
Command-Line Interface
rdfhash [-h] -d DATA [-f {turtle,n-triples,trig,n-quads,n3,rdf}]
[-m {md5,sha1,sha224,sha256,sha384,sha512,sha3_224,sha3_256,sha3_384,sha3_512,blake2b,blake2s}]
[-a ACCEPT [ACCEPT ...]] [-v] [--debug] [--sparql SPARQL]
Replace selected subjects with hash of their triples (`{predicate} {object}.\n` sorted + joined).
options:
-h, --help show this help message and exit
-d DATA, --data DATA Input data. (RDF)
-f {turtle,n-triples,trig,n-quads,n3,rdf}, --format {turtle,n-triples,trig,n-quads,n3,rdf}
Input format.
-m {md5,sha1,sha224,sha256,sha384,sha512,sha3_224,sha3_256,sha3_384,sha3_512,blake2b,blake2s}, --method {md5,sha1,sha224,sha256,sha384,sha512,sha3_224,sha3_256,sha3_384,sha3_512,blake2b,blake2s}
Hash method.
-a ACCEPT [ACCEPT ...], --accept ACCEPT [ACCEPT ...]
Accept format.
-v, --verbose Show 'info' level logs.
--debug Show 'debug' level logs.
--sparql SPARQL, --sparql-select-subjects SPARQL
SPARQL SELECT query returning subject URIs to replace with hash of their triples. Defaults to all
blank node subjects.
Example
Test the tool out on the directory ./examples.
rdfhash --data ./examples/product_0.ttl
Blank Node Input
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix c: <def:class:> .
@prefix currency: <def:class:currency> .
@prefix p: <def:property:> .
_:xbox_series_x
rdf:type c:Product ;
p:name "Microsoft - Xbox Series X 1TB Console - Black" ;
p:url <https://www.bestbuy.com/site/microsoft-xbox-series-x-1tb-console-black/6428324.p> ;
p:available false ;
p:price [
rdf:type currency:USDollar ;
p:amount "499.99"^^xsd:decimal ;
] .
_:ps5
rdf:type c:Product ;
p:name "Sony - PlayStation 5 Console" ;
p:url <https://www.bestbuy.com/site/sony-playstation-5-console/6426149.p> ;
p:available false ;
p:price [
rdf:type currency:USDollar ;
p:amount "499.99"^^xsd:decimal ;
] .
md5 Output
<md5:e2edf345944d2d2360ca0af3a2e263e5>
a c:Product ;
p:available false ;
p:name "Microsoft - Xbox Series X 1TB Console - Black" ;
p:price <md5:230919236fbe71a692d10c9a693fdd2b> ;
p:url <https://www.bestbuy.com/site/microsoft-xbox-series-x-1tb-console-black/6428324.p> .
<md5:64c8f3c04879effcad67df5e62c00245>
a c:Product ;
p:available false ;
p:name "Sony - PlayStation 5 Console" ;
p:price <md5:230919236fbe71a692d10c9a693fdd2b> ;
p:url <https://www.bestbuy.com/site/sony-playstation-5-console/6426149.p> .
<md5:230919236fbe71a692d10c9a693fdd2b>
a currency:USDollar ;
p:amount 499.99 .
- The nested definition for
499.99USD is referenced 2 times and defined only once.
Simple time-entry data
@prefix d: <data:> .
d:TimeEntry__ps5__2020_11_12
a c:TimeEntry ;
p:date "2020-11-12"^^xsd:date ;
p:value <md5:64c8f3c04879effcad67df5e62c00245> .
d:TimeEntry__xbox_series_x__2020_10_12
a c:TimeEntry ;
p:date "2020-10-12"^^xsd:date ;
p:value <md5:e2edf345944d2d2360ca0af3a2e263e5> .
d:TimeEntry__ps5__2022_06_01
a c:TimeEntry ;
p:date "2022-06-01"^^xsd:date ;
p:value <md5:64c8f3c04879effcad67df5e62c00245> .
- If a webscraper encounters the exact same definition, output RDF will be identical. Only triples added are references to the existing triples.
Limitations
-
Named graphs are currently not supported.
-
Cannot update triples on hashed subjects.
-
Updating statements on a hashed subject will result in a hash mismatch.
-
Blank node statement input:
[ a <def:class:Person> ] .
-
Hashed subject output:
<sha1:f0392681a6a701d9672925133bf1207f4be9e412> a <def:class:Person> . -
Updating statements on hashed subject:
# Actual sha1 Result: 0c0140462cb569cb700fe5d01bf5efb3185cdb4d <sha1:f0392681a6a701d9672925133bf1207f4be9e412> a <def:class:Person> ; <def:property:age> "24"^^<http://www.w3.org/2001/XMLSchema#integer> .- Mismatch between original hash and actual hash result.
- Original:
<sha1:f0392681a6a701d9672925133bf1207f4be9e412> - Actual:
<sha1:0c0140462cb569cb700fe5d01bf5efb3185cdb4d>
- Original:
- Mismatch between original hash and actual hash result.
-
-
Cannot resolve circular dependencies between selected subjects.
_:b1 <def:property:connectedTo> _:b2 . _:b2 <def:property:connectedTo> _:b1 .
-
Using multiple hashing methods is not recommended.
_:error_multiple_hash_methods <p:0> <md5:64eee8e358fd1b6340385f4588e5536b> ; <p:1> <sha1:2408f5f487b26247f9a82a6b9ea76f21b79bb12f> .- Using multiple hashing methods can result in duplicate hashed statements.
- Sticking with 1 hashing method allows for the smallest possible graph size.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file rdfhash-0.2.2.tar.gz.
File metadata
- Download URL: rdfhash-0.2.2.tar.gz
- Upload date:
- Size: 11.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.11.0b3+
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
277314d3ba6fed05c6a45fc03a52e2d006a163c7a913bd7452368f7e9a60b8be
|
|
| MD5 |
3b269592ca84c5c77627fe83fb277548
|
|
| BLAKE2b-256 |
6e4652a4eb8679f426aec6cbdd0f1909b70a5d7e0a4e49e41db118cc06be86b0
|