Skip to main content

Command-line tool for hashing RDF definitions into resolvable identifiers. (Default: sha256)

Project description

RDF Hash

Command-line tool for hashing RDF definitions into resolvable identifiers ( sha256, md5, blake2b, etc. ).

Selected subjects are replaced with hash of their triples (Default: blank node subjects).

Set of triples on a given subject are sorted by {predicate} {object}.\n, then hashed together. The hash result replaces the subject URI (Ex: <md5:fdd61ec7cdbc7241f0289339678dd008>).

Setup

Dependencies

Getting Started

  • Install pip packages

    python3.10 -m pip install rdfhash
    
  • Test script

    rdfhash --data="[ a <def:class:Person> ] ." --method=sha1
    
    <sha1:f0392681a6a701d9672925133bf1207f4be9e412> a <def:class:Person> .
    

Command-Line Interface

rdfhash [-h] -d DATA [-f {turtle,n-triples,trig,n-quads,n3,rdf}]
        [-m {md5,sha1,sha224,sha256,sha384,sha512,sha3_224,sha3_256,sha3_384,sha3_512,blake2b,blake2s}]
        [-a ACCEPT [ACCEPT ...]] [-v] [--debug] [--sparql SPARQL]

Replace selected subjects with hash of their triples (`{predicate} {object}.\n` sorted + joined).

options:
  -h, --help            show this help message and exit
  -d DATA, --data DATA  Input data. (RDF)
  -f {turtle,n-triples,trig,n-quads,n3,rdf}, --format {turtle,n-triples,trig,n-quads,n3,rdf}
                        Input format.
  -m {md5,sha1,sha224,sha256,sha384,sha512,sha3_224,sha3_256,sha3_384,sha3_512,blake2b,blake2s}, --method {md5,sha1,sha224,sha256,sha384,sha512,sha3_224,sha3_256,sha3_384,sha3_512,blake2b,blake2s}
                        Hash method.
  -a ACCEPT [ACCEPT ...], --accept ACCEPT [ACCEPT ...]
                        Accept format.
  -v, --verbose         Show 'info' level logs.
  --debug               Show 'debug' level logs.
  --sparql SPARQL, --sparql-select-subjects SPARQL
                        SPARQL SELECT query returning subject URIs to replace with hash of their triples. Defaults to all
                        blank node subjects.

Example

Test the tool out on the directory ./examples.

rdfhash --data ./examples/product_0.ttl

Blank Node Input

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

@prefix c:         <def:class:> .
@prefix currency:  <def:class:currency> .
@prefix p:         <def:property:> .

_:xbox_series_x
    rdf:type c:Product ;
    p:name "Microsoft - Xbox Series X 1TB Console - Black" ;
    p:url <https://www.bestbuy.com/site/microsoft-xbox-series-x-1tb-console-black/6428324.p> ;
    p:available false ;
    p:price [
        rdf:type currency:USDollar ;
        p:amount "499.99"^^xsd:decimal ;
    ] .

_:ps5
    rdf:type c:Product ;
    p:name "Sony - PlayStation 5 Console" ;
    p:url <https://www.bestbuy.com/site/sony-playstation-5-console/6426149.p> ;
    p:available false ;
    p:price [
        rdf:type currency:USDollar ;
        p:amount "499.99"^^xsd:decimal ;
    ] .

md5 Output

<md5:e2edf345944d2d2360ca0af3a2e263e5>
    a c:Product ;
    p:available false ;
    p:name "Microsoft - Xbox Series X 1TB Console - Black" ;
    p:price <md5:230919236fbe71a692d10c9a693fdd2b> ;
    p:url <https://www.bestbuy.com/site/microsoft-xbox-series-x-1tb-console-black/6428324.p> .

<md5:64c8f3c04879effcad67df5e62c00245>
    a c:Product ;
    p:available false ;
    p:name "Sony - PlayStation 5 Console" ;
    p:price <md5:230919236fbe71a692d10c9a693fdd2b> ;
    p:url <https://www.bestbuy.com/site/sony-playstation-5-console/6426149.p> .

<md5:230919236fbe71a692d10c9a693fdd2b>
    a currency:USDollar ;
    p:amount 499.99 .
  • The nested definition for 499.99 USD is referenced 2 times and defined only once.

Simple time-entry data

@prefix d:  <data:> .

d:TimeEntry__ps5__2020_11_12
    a c:TimeEntry ;
    p:date "2020-11-12"^^xsd:date ;
    p:value <md5:64c8f3c04879effcad67df5e62c00245> .

d:TimeEntry__xbox_series_x__2020_10_12
    a c:TimeEntry ;
    p:date "2020-10-12"^^xsd:date ;
    p:value <md5:e2edf345944d2d2360ca0af3a2e263e5> .

d:TimeEntry__ps5__2022_06_01
    a c:TimeEntry ;
    p:date "2022-06-01"^^xsd:date ;
    p:value <md5:64c8f3c04879effcad67df5e62c00245> .
  • If a webscraper encounters the exact same definition, output RDF will be identical. Only triples added are references to the existing triples.

Limitations

  • Named graphs are currently not supported.

  • Cannot update triples on hashed subjects.

    • Updating statements on a hashed subject will result in a hash mismatch.

    • Blank node statement input:

      [ a <def:class:Person> ] .
      
    • Hashed subject output:

      <sha1:f0392681a6a701d9672925133bf1207f4be9e412>
          a <def:class:Person> .
      
    • Updating statements on hashed subject:

      # Actual sha1 Result: 0c0140462cb569cb700fe5d01bf5efb3185cdb4d
      
      <sha1:f0392681a6a701d9672925133bf1207f4be9e412>
          a <def:class:Person> ;
          <def:property:age> "24"^^<http://www.w3.org/2001/XMLSchema#integer> .
      
      • Mismatch between original hash and actual hash result.
        • Original: <sha1:f0392681a6a701d9672925133bf1207f4be9e412>
        • Actual: <sha1:0c0140462cb569cb700fe5d01bf5efb3185cdb4d>
  • Cannot resolve circular dependencies between selected subjects.

    _:b1 <def:property:connectedTo> _:b2 .
    _:b2 <def:property:connectedTo> _:b1 .
    
  • Using multiple hashing methods is not recommended.

    _:error_multiple_hash_methods
        <p:0> <md5:64eee8e358fd1b6340385f4588e5536b> ;
        <p:1> <sha1:2408f5f487b26247f9a82a6b9ea76f21b79bb12f> .
    
    • Using multiple hashing methods can result in duplicate hashed statements.
    • Sticking with 1 hashing method allows for the smallest possible graph size.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rdfhash-0.2.5.tar.gz (10.1 kB view hashes)

Uploaded source

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page