Skip to main content

Command-line tool for hashing RDF definitions into resolvable identifiers. (Default: sha256)

Project description

RDF Hash

Command-line tool for hashing RDF definitions into resolvable identifiers ( sha256, md5, blake2b, etc. ).

Selected subjects are replaced with hash of their triples (Default: blank node subjects).

Set of triples on a given subject are sorted by {predicate} {object}.\n, then hashed together. The hash result replaces the subject URI (Ex: <md5:fdd61ec7cdbc7241f0289339678dd008>).

Setup

Dependencies

Getting Started

  • Install pip packages

    python3.10 -m pip install rdfhash
    
  • Test script

    rdfhash --data="[ a <def:class:Person> ] ." --method=sha1
    
    <sha1:f0392681a6a701d9672925133bf1207f4be9e412> a <def:class:Person> .
    

Command-Line Interface

rdfhash [-h] -d DATA [-f {turtle,n-triples,trig,n-quads,n3,rdf}]
        [-m {md5,sha1,sha224,sha256,sha384,sha512,sha3_224,sha3_256,sha3_384,sha3_512,blake2b,blake2s}]
        [-a ACCEPT [ACCEPT ...]] [-v] [--debug] [--sparql SPARQL]

Replace selected subjects with hash of their triples (`{predicate} {object}.\n` sorted + joined).

options:
  -h, --help            show this help message and exit
  -d DATA, --data DATA  Input data. (RDF)
  -f {turtle,n-triples,trig,n-quads,n3,rdf}, --format {turtle,n-triples,trig,n-quads,n3,rdf}
                        Input format.
  -m {md5,sha1,sha224,sha256,sha384,sha512,sha3_224,sha3_256,sha3_384,sha3_512,blake2b,blake2s}, --method {md5,sha1,sha224,sha256,sha384,sha512,sha3_224,sha3_256,sha3_384,sha3_512,blake2b,blake2s}
                        Hash method.
  -a ACCEPT [ACCEPT ...], --accept ACCEPT [ACCEPT ...]
                        Accept format.
  -v, --verbose         Show 'info' level logs.
  --debug               Show 'debug' level logs.
  --sparql SPARQL, --sparql-select-subjects SPARQL
                        SPARQL SELECT query returning subject URIs to replace with hash of their triples. Defaults to all
                        blank node subjects.

Example

Test the tool out on the directory ./examples.

rdfhash --data ./examples/product_0.ttl

Blank Node Input

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

@prefix c:         <def:class:> .
@prefix currency:  <def:class:currency> .
@prefix p:         <def:property:> .

_:xbox_series_x
    rdf:type c:Product ;
    p:name "Microsoft - Xbox Series X 1TB Console - Black" ;
    p:url <https://www.bestbuy.com/site/microsoft-xbox-series-x-1tb-console-black/6428324.p> ;
    p:available false ;
    p:price [
        rdf:type currency:USDollar ;
        p:amount "499.99"^^xsd:decimal ;
    ] .

_:ps5
    rdf:type c:Product ;
    p:name "Sony - PlayStation 5 Console" ;
    p:url <https://www.bestbuy.com/site/sony-playstation-5-console/6426149.p> ;
    p:available false ;
    p:price [
        rdf:type currency:USDollar ;
        p:amount "499.99"^^xsd:decimal ;
    ] .

md5 Output

<md5:e2edf345944d2d2360ca0af3a2e263e5>
    a c:Product ;
    p:available false ;
    p:name "Microsoft - Xbox Series X 1TB Console - Black" ;
    p:price <md5:230919236fbe71a692d10c9a693fdd2b> ;
    p:url <https://www.bestbuy.com/site/microsoft-xbox-series-x-1tb-console-black/6428324.p> .

<md5:64c8f3c04879effcad67df5e62c00245>
    a c:Product ;
    p:available false ;
    p:name "Sony - PlayStation 5 Console" ;
    p:price <md5:230919236fbe71a692d10c9a693fdd2b> ;
    p:url <https://www.bestbuy.com/site/sony-playstation-5-console/6426149.p> .

<md5:230919236fbe71a692d10c9a693fdd2b>
    a currency:USDollar ;
    p:amount 499.99 .
  • The nested definition for 499.99 USD is referenced 2 times and defined only once.

Simple time-entry data

@prefix d:  <data:> .

d:TimeEntry__ps5__2020_11_12
    a c:TimeEntry ;
    p:date "2020-11-12"^^xsd:date ;
    p:value <md5:64c8f3c04879effcad67df5e62c00245> .

d:TimeEntry__xbox_series_x__2020_10_12
    a c:TimeEntry ;
    p:date "2020-10-12"^^xsd:date ;
    p:value <md5:e2edf345944d2d2360ca0af3a2e263e5> .

d:TimeEntry__ps5__2022_06_01
    a c:TimeEntry ;
    p:date "2022-06-01"^^xsd:date ;
    p:value <md5:64c8f3c04879effcad67df5e62c00245> .
  • If a webscraper encounters the exact same definition, output RDF will be identical. Only triples added are references to the existing triples.

Limitations

  • Named graphs are currently not supported.

  • Cannot update triples on hashed subjects.

    • Updating statements on a hashed subject will result in a hash mismatch.

    • Blank node statement input:

      [ a <def:class:Person> ] .
      
    • Hashed subject output:

      <sha1:f0392681a6a701d9672925133bf1207f4be9e412>
          a <def:class:Person> .
      
    • Updating statements on hashed subject:

      # Actual sha1 Result: 0c0140462cb569cb700fe5d01bf5efb3185cdb4d
      
      <sha1:f0392681a6a701d9672925133bf1207f4be9e412>
          a <def:class:Person> ;
          <def:property:age> "24"^^<http://www.w3.org/2001/XMLSchema#integer> .
      
      • Mismatch between original hash and actual hash result.
        • Original: <sha1:f0392681a6a701d9672925133bf1207f4be9e412>
        • Actual: <sha1:0c0140462cb569cb700fe5d01bf5efb3185cdb4d>
  • Cannot resolve circular dependencies between selected subjects.

    _:b1 <def:property:connectedTo> _:b2 .
    _:b2 <def:property:connectedTo> _:b1 .
    
  • Using multiple hashing methods is not recommended.

    _:error_multiple_hash_methods
        <p:0> <md5:64eee8e358fd1b6340385f4588e5536b> ;
        <p:1> <sha1:2408f5f487b26247f9a82a6b9ea76f21b79bb12f> .
    
    • Using multiple hashing methods can result in duplicate hashed statements.
    • Sticking with 1 hashing method allows for the smallest possible graph size.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rdfhash-0.2.2.tar.gz (11.4 kB view details)

Uploaded Source

File details

Details for the file rdfhash-0.2.2.tar.gz.

File metadata

  • Download URL: rdfhash-0.2.2.tar.gz
  • Upload date:
  • Size: 11.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.0b3+

File hashes

Hashes for rdfhash-0.2.2.tar.gz
Algorithm Hash digest
SHA256 277314d3ba6fed05c6a45fc03a52e2d006a163c7a913bd7452368f7e9a60b8be
MD5 3b269592ca84c5c77627fe83fb277548
BLAKE2b-256 6e4652a4eb8679f426aec6cbdd0f1909b70a5d7e0a4e49e41db118cc06be86b0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page