Convert ArangoDB graphs to RDF & vice-versa.
Project description
ArangoRDF
Convert RDF Graphs to ArangoDB, and vice-versa.
About RDF
RDF is a standard model for data interchange on the Web. RDF has features that facilitate data merging even if the underlying schemas differ, and it specifically supports the evolution of schemas over time without requiring all the data consumers to be changed.
RDF extends the linking structure of the Web to use URIs to name the relationship between things as well as the two ends of the link (this is usually referred to as a "triple"). Using this simple model, it allows structured and semi-structured data to be mixed, exposed, and shared across different applications.
This linking structure forms a directed, labeled graph, where the edges represent the named link between two resources, represented by the graph nodes. This graph view is the easiest possible mental model for RDF and is often used in easy-to-understand visual explanations.
Resources to get started:
Installation
Latest Release
pip install arango-rdf
Current State
pip install git+https://github.com/ArangoDB-Community/ArangoRDF
Quickstart
Run the full version with Google Colab:
from rdflib import Graph
from arango import ArangoClient
from arango_rdf import ArangoRDF
db = ArangoClient(hosts="http://localhost:8529").db("_system_", username="root", password="")
adbrdf = ArangoRDF(db)
g = Graph()
g.parse("https://raw.githubusercontent.com/stardog-union/stardog-tutorials/master/music/beatles.ttl")
# RDF to ArangoDB
###################################################################################
# 1.1: RDF-Topology Preserving Transformation (RPT)
adbrdf.rdf_to_arangodb_by_rpt("Beatles", g, overwrite_graph=True)
# 1.2: Property Graph Transformation (PGT)
adbrdf.rdf_to_arangodb_by_pgt("Beatles", g, overwrite_graph=True)
g = adbrdf.load_meta_ontology(g)
# 1.3: RPT w/ Graph Contextualization
adbrdf.rdf_to_arangodb_by_rpt("Beatles", g, contextualize_graph=True, overwrite_graph=True)
# 1.4: PGT w/ Graph Contextualization
adbrdf.rdf_to_arangodb_by_pgt("Beatles", g, contextualize_graph=True, overwrite_graph=True)
# 1.5: PGT w/ ArangoDB Document-to-Collection Mapping Exposed
adb_mapping = adbrdf.build_adb_mapping_for_pgt(g)
print(adb_mapping.serialize())
adbrdf.rdf_to_arangodb_by_pgt("Beatles", g, adb_mapping, contextualize_graph=True, overwrite_graph=True)
# ArangoDB to RDF
###################################################################################
# Start from scratch!
g = Graph()
g.parse("https://raw.githubusercontent.com/stardog-union/stardog-tutorials/master/music/beatles.ttl")
adbrdf.rdf_to_arangodb_by_pgt("Beatles", g, overwrite_graph=True)
# 2.1: Via Graph Name
g2, adb_mapping_2 = adbrdf.arangodb_graph_to_rdf("Beatles", Graph())
# 2.2: Via Collection Names
g3, adb_mapping_3 = adbrdf.arangodb_collections_to_rdf(
"Beatles",
Graph(),
v_cols={"Album", "Band", "Class", "Property", "SoloArtist", "Song"},
e_cols={"artist", "member", "track", "type", "writer"},
)
print(len(g2), len(adb_mapping_2))
print(len(g3), len(adb_mapping_3))
print('--------------------')
print(g2.serialize())
print('--------------------')
print(adb_mapping_2.serialize())
print('--------------------')
Development & Testing
git clone https://github.com/ArangoDB-Community/ArangoRDF
cd arango-rdf
- (create virtual environment of choice)
pip install -e .[dev]
- (create an ArangoDB instance with method of choice)
pytest --url <> --dbName <> --username <> --password <>
Note: A pytest
parameter can be omitted if the endpoint is using its default value:
def pytest_addoption(parser):
parser.addoption("--url", action="store", default="http://localhost:8529")
parser.addoption("--dbName", action="store", default="_system")
parser.addoption("--username", action="store", default="root")
parser.addoption("--password", action="store", default="")
Additional Info: RDF to ArangoDB
RDF-to-ArangoDB functionality has been implemented using concepts described in the paper Transforming RDF-star to Property Graphs: A Preliminary Analysis of Transformation Approaches.
In other words, ArangoRDF
offers 2 RDF-to-ArangoDB transformation methods:
- RDF-topology Preserving Transformation (RPT):
ArangoRDF.rdf_to_arangodb_by_rpt()
- Property Graph Transformation (PGT):
ArangoRDF.rdf_to_arangodb_by_pgt()
RPT preserves the RDF Graph structure by transforming each RDF Statement into an ArangoDB Edge.
PGT on the other hand ensures that Datatype Property Statements are mapped as ArangoDB Document Properties.
@prefix ex: <http://example.org/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
ex:book ex:publish_date "1963-03-22"^^xsd:date .
ex:book ex:pages "100"^^xsd:integer .
ex:book ex:cover 20 .
ex:book ex:index 55 .
RPT | PGT |
---|---|
RPT
The ArangoRDF.rdf_to_arangodb_by_rpt
method will store the RDF Resources of your RDF Graph under the following ArangoDB Collections:
- {graph_name}_URIRef: The Document collection for `rdflib.term.URIRef` resources.
- {graph_name}_BNode: The Document collection for`rdflib.term.BNode` resources.
- {graph_name}_Literal: The Document collection for `rdflib.term.Literal` resources.
- {graph_name}_Statement: The Edge collection for all triples/quads.
PGT
In contrast to RPT, the ArangoRDF.rdf_to_arangodb_by_pgt
method will rely on the nature of the RDF Resource/Statement to determine which ArangoDB Collection it belongs to. This is referred as the ArangoDB Collection Mapping Process. This process relies on 2 fundamental URIs:
-
<http://www.arangodb.com/collection>
(adb:collection)- Any RDF Statement of the form
<http://example.com/Bob> <adb:collection> "Person"
will map the Subject to the ArangoDB "Person" document collection.
- Any RDF Statement of the form
-
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
(rdf:type)-
This strategy is divided into 3 cases:
-
If an RDF Resource only has one
rdf:type
statement, then the local name of the RDF Object is used as the ArangoDB Document Collection name. For example,<http://example.com/Bob> <rdf:type> <http://example.com/Person>
would create an JSON Document for<http://example.com/Bob>
, and place it under thePerson
Document Collection. NOTE: The RDF Object will also have its own JSON Document created, and will be placed under the "Class" Document Collection. -
If an RDF Resource has multiple
rdf:type
statements, with some (or all) of the RDF Objects of those statements belonging in anrdfs:subClassOf
Taxonomy, then the local name of the "most specific" Class within the Taxonomy is used (i.e the Class with the biggest depth). If there is a tie between 2+ Classes, then the URIs are alphabetically sorted & the first one is picked. -
If an RDF Resource has multiple
rdf:type
statements, with none of the RDF Objects of those statements belonging in anrdfs:subClassOf
Taxonomy, then the URIs are alphabetically sorted & the first one is picked. The local name of the selected URI will be designated as the Document collection for that Resource.
-
-
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file arango_rdf-0.1.0.tar.gz
.
File metadata
- Download URL: arango_rdf-0.1.0.tar.gz
- Upload date:
- Size: 57.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 308cc0d6da40799bdfe098fc012c6a3f3cbc5242ca2b335b3dc4b3874b944baf |
|
MD5 | 3d0fe39545d09f05462197bf68fef154 |
|
BLAKE2b-256 | 1de830c65414b6fd3ac1055937148a21ba51790ce8624b6b0e9f44018b05ca71 |
File details
Details for the file arango_rdf-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: arango_rdf-0.1.0-py3-none-any.whl
- Upload date:
- Size: 48.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5965382112e0da2b317410b156dc18429df9e00bd06e5057823545715d74cc1b |
|
MD5 | 8f9a8c3cfa3c0a682ae4add47f46ec89 |
|
BLAKE2b-256 | 92322ee28dc4378240b3a9fdeaef6107154ab4c4bd1277793d106cbeb5a2a06f |