Read and query HDT document with ease in Python
Project description
Read and query HDT document with ease in Python
Requirements
Python version 3.6.4 or higher
gcc/clang with c++11 support
Python Development headers > You should have the Python.h header available on your system. > For example, for Python 3.6, install the python3.6-dev package on Debian/Ubuntu systems.
Then, install the pybind11 library
pip install pybind11
Installation
Installation in a virtualenv is strongly advised!
Pip install (recommended)
pip install hdt
Manual installation
git clone https://github.com/Callidon/pyHDT cd pyHDT/ ./install.sh
Getting started
from hdt import HDTDocument
# Load an HDT file.
# Missing indexes are generated automatically, add False as the second argument to disable them
document = HDTDocument("test.hdt")
# Display some metadata about the HDT document itself
print("nb triples: %i" % document.total_triples)
print("nb subjects: %i" % document.nb_subjects)
print("nb predicates: %i" % document.nb_predicates)
print("nb objects: %i" % document.nb_objects)
print("nb shared subject-object: %i" % document.nb_shared)
# Fetch all triples that matches { ?s ?p ?o }
# Use empty strings ("") to indicates variables
triples, cardinality = document.search_triples("", "", "")
print("cardinality of { ?s ?p ?o }: %i" % cardinality)
for triple in triples:
print(triple)
# Search also support limit and offset
triples, cardinality = document.search_triples("", "", "", limit=10, offset=100)
# etc ...
Handling non UTF-8 strings in python
If the HDT document has been encoded with a non UTF-8 encoding the previous code won’t work correctly and will result in a UnicodeDecodeError. More details on how to convert string to str from c++ to python here
To handle this we doubled the API of the HDT document by adding:
search_triples_bytes(...) return an iterator of triples as (py::bytes, py::bytes, py::bytes)
search_join_bytes(...) return an iterator of sets of solutions mapping as py::set(py::bytes, py::bytes)
convert_tripleid_bytes(...) return a triple as: (py::bytes, py::bytes, py::bytes)
convert_id_bytes(...) return a py::bytes
Parameters and documentation are the same as the standard version
from hdt import HDTDocument
# Load an HDT file.
# Missing indexes are generated automatically, add False as the second argument to disable them
document = HDTDocument("test.hdt")
it = document.search_triple_bytes("", "", "")
for s, p, o in it:
print(s, p, o) # print b'...', b'...', b'...'
# now decode it, or handle any error
try:
s, p, o = s.decode('UTF-8'), p.decode('UTF-8'), o.decode('UTF-8')
except UnicodeDecodeError as err:
# try another other codecs
pass
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file hdt-2.3.tar.gz
.
File metadata
- Download URL: hdt-2.3.tar.gz
- Upload date:
- Size: 229.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.20.1 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 103eac995122a9109408bfcfa5d7508d09a25ab7cf954c541b48a27ebb01c2f9 |
|
MD5 | e4ebb2bc4c2290617248d2eb5c820cef |
|
BLAKE2b-256 | 518241f1e4a131881da64a1ab2c4675dd93020a1a7109be08a2eb790cb6b92c6 |