Skip to main content

SQLite-backed RDFLib store

Project description

RDFLib-SQLite3

An SQLite-backed RDFLib store.

RDFLib-SQLite3 allws RDFLib RDF graphs to be persisted in an SQLite database. Furthermore it allows full-text and Geospatial indexing: Using the SQLite FTS5 and R*Tree.

Usage

from rdflib import Graph
import rdflib_sqlite3


# Create a Graph backed by an SQLite database
g = Graph("SQLite3")
# Open and create the database. See https://www.sqlite.org/uri.html for the URI format.
g.open("file:my-rdf.sqlite", create=True)

# do stuff with the graph...

Goals

RDFLib-SQLite3 is primary goal is stability. RDFLib-SQLite3 should be usable with minimal maintainance in many year and the data format should be desigend for long-term readability.

SQLite is a suitable backend as it uses a stable file format, offers long term support is well tested and widely used. RDFLib-SQLite3 uses the SQLite bindings that are provided as part of the Python 3 standard library, which will most likely be included in future versions of Python 3.

Database Schema

The RDF graphs is peristed in the SQLite database using two tables:

  • rdf_term:

     CREATE TABLE IF NOT EXISTS rdf_term (
     	id INTEGER PRIMARY KEY,
     	term BLOB UNIQUE
     );
    

    This is a mapping from RDF terms encoded using RDF/CBOR to integer identifiers.

  • rdf_triple:

     CREATE_RDF_TRIPLE_TABLE = """
     CREATE TABLE IF NOT EXISTS rdf_triple (
     	subject INTEGER NOT_NULL REFERENCES rdf_term ON DELETE RESTRICT,
     	predicate INTEGER NOT_NULL REFERENCES rdf_term ON DELETE RESTRICT,
     	object INTEGER NOT_NULL REFERENCES rdf_term ON DELETE RESTRICT
     );
    

    Holds triples with triple elements being identifiers as stored in the rdf_term table. Additional indices are defined on the rdf_triple table for efficient querying (and ensuring uniquness of triples).

Limitations

  • No support for Quads
  • No support for REGEXTerm, Date?, DateRagen? queries

TODOs

  • Triple removal
  • Tests
  • Database destruction (destroy method)
  • Database garbage collection (gc method)
  • Full-text search
  • Geospatial queries
  • Make SPARQL queries more efficient. RDFLib provides a SPARQL implementation that works with RDFLib-SQLite3. Unfortunately, performance is very limited as the SPARQL implementation does everyhing in Python. It would make much more efficient to offload query optimization, joins and even recursive queries to SQLite. This amounts to writing an SPARQL implementation that knows how to take advantage of SQLite.

Related Software

Publishing to PyPi

Make sure version is set propertly in pyproject.toml and rdflib_sqlite3/__init__.py.

pip install build twine

# Build the package
python -m build

# Upload using twine
twine upload dist/*

Acknowledgments

This software was initially developed as part of the SNSF-Ambizione funded research project "Computing the Social. Psychographics and Social Physics in the Digital Age".

License

AGPL-3.0-or-later

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rdflib_sqlite3-0.1.0.tar.gz (28.5 kB view hashes)

Uploaded Source

Built Distribution

rdflib_sqlite3-0.1.0-py3-none-any.whl (28.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page