Skip to main content

A Store back-end for rdflib to allow for reading and querying HDT documents

Project description

Build Status Documentation Status PyPI version

Read and query HDT document with ease in Python

Online Documentation

Requirements

  • Python version 3.6.4 or higher

  • pip

  • gcc/clang with c++11 support

  • Python Development headers > You should have the Python.h header available on your system. > For example, for Python 3.6, install the python3.6-dev package on Debian/Ubuntu systems.

Then, install the pybind11 library

pip install pybind11

Installation

Installation in a virtualenv is strongly advised!

Manual installation

git clone https://github.com/Callidon/pyHDT
cd pyHDT/
./install.sh

Getting started

from hdt import HDTDocument

# Load an HDT file.
# Missing indexes are generated automatically, add False as the second argument to disable them
document = HDTDocument("test.hdt")

# Display some metadata about the HDT document itself
print("nb triples: %i" % document.total_triples)
print("nb subjects: %i" % document.nb_subjects)
print("nb predicates: %i" % document.nb_predicates)
print("nb objects: %i" % document.nb_objects)
print("nb shared subject-object: %i" % document.nb_shared)

# Fetch all triples that matches { ?s ?p ?o }
# Use empty strings ("") to indicates variables
triples, cardinality = document.search_triples("", "", "")

print("cardinality of { ?s ?p ?o }: %i" % cardinality)
for triple in triples:
  print(triple)

# Search also support limit and offset
triples, cardinality = document.search_triples("", "", "", limit=10, offset=100)
# etc ...

Handling non UTF-8 strings in python

If the HDT document has been encoded with a non UTF-8 encoding the previous code won’t work correctly and will result in a UnicodeDecodeError. More details on how to convert string to str from c++ to python here

To handle this we doubled the API of the HDT document by adding:

  • search_triples_bytes(...) return an iterator of triples as (py::bytes, py::bytes, py::bytes)

  • search_join_bytes(...) return an iterator of sets of solutions mapping as py::set(py::bytes, py::bytes)

  • convert_tripleid_bytes(...) return a triple as: (py::bytes, py::bytes, py::bytes)

  • convert_id_bytes(...) return a py::bytes

Parameters and documentation are the same as the standard version

from hdt import HDTDocument

 # Load an HDT file.
 # Missing indexes are generated automatically, add False as the second argument to disable them
document = HDTDocument("test.hdt")
it = document.search_triple_bytes("", "", "")

for s, p, o in it:
  print(s, p, o) # print b'...', b'...', b'...'
  # now decode it, or handle any error
  try:
    s, p, o = s.decode('UTF-8'), p.decode('UTF-8'), o.decode('UTF-8')
  except UnicodeDecodeError as err:
    # try another other codecs
    pass

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rdflib_hdt-1.0.tar.gz (231.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rdflib_hdt-1.0-py3.7-macosx-10.14-x86_64.egg (1.0 MB view details)

Uploaded Egg

File details

Details for the file rdflib_hdt-1.0.tar.gz.

File metadata

  • Download URL: rdflib_hdt-1.0.tar.gz
  • Upload date:
  • Size: 231.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.20.1 setuptools/46.0.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.7

File hashes

Hashes for rdflib_hdt-1.0.tar.gz
Algorithm Hash digest
SHA256 e93366510cbcd2edc2aad4f6827647bcc7e180417968bf8c148eccb96b728c7a
MD5 7712def848f45e05fde1110c2acac55d
BLAKE2b-256 a438ba208c75469b58837f24e038dfdd304f99c78db6a6bcb8d1e4ccbef78ead

See more details on using hashes here.

File details

Details for the file rdflib_hdt-1.0-py3.7-macosx-10.14-x86_64.egg.

File metadata

  • Download URL: rdflib_hdt-1.0-py3.7-macosx-10.14-x86_64.egg
  • Upload date:
  • Size: 1.0 MB
  • Tags: Egg
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.20.1 setuptools/46.0.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.7

File hashes

Hashes for rdflib_hdt-1.0-py3.7-macosx-10.14-x86_64.egg
Algorithm Hash digest
SHA256 b46eb4c33e6202d46449183a4226e88387ef08355b0565bcca0a586aea5d098d
MD5 76519fe4fcba3399c9f4d19ab1d9108e
BLAKE2b-256 704fef979a31a73b226d4d29e8dc8e6cf86e475ac9ebb2ec4aff912395f7897d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page