Skip to main content

Exposes RDF datasets from sparql endpoints for relational learning models in convenient formats

Project description

RDF2SRL

This package exposes RDF data from sparql database engines for relational learning models.

It provides some convenience functions that send sparql queries in http requests for both public and private sparql endpoints.

Starting from version 1.0.3, a SmartRDFGraphDataset class is added that performs client-side optimization of the Sparql query processing. This improves the performance if the configuration of the public endpoint isn't tuned for a specific query.

Installation

You cna install the requirements using

pip install -r requiremtns.txt

You can install the RDF2SRL package from PyPI:

pip install RDF2SRL

Getting Started

Collecting Statistics about the data

We can use this package to get some statistics about the DBpedia dataset. Let's use the DBpedia public endpoint provided by OpenLink Virtuoso

First, import the RDFGraphDataset class from the python package rdf2srl and initialize the RDFGraphDataset class with the endpoint URI and the graph URI

from rdf2srl import RDFGraphDataset
loader = RDFGraphDataset(sparql_endpoint="http://dbpedia.org/sparql", graph_name='http://dbpedia.org/')

Now, Let's find the number of (subject, predicate, object) triples in the DBpedia graph:

num_triples = loader.num_triples()

To find the number of (subject, predicate, object) triples where the object is another entity, find the number of entity to entity triples.

num_e2e_triples = loader.num_entity2entity_triples()

To find the number of (subject, predicate, object) triples where the object is a literal value, find the number of entity to entity triples.

num_e2l_triples = loader.num_entity2literal_triples()

Loading the dataset into a dataframe

We can also use the package to access the entities in the graph. A useful format for relational learning models is a dictionary that maps each entity to an index that starts from 0 to n_entities-1. Other available formats are pandas dataframes and python lists.

entity2idx = loader.entities('dict')

Similarly, we can get all the entity-to-entity predicates in the graph. A useful format for relational learning models is a dictionary that maps each predicate to an index that starts from 0 to n_relations-1. Other available formats are pandas dataframes and python lists.

relation2idx = loader.relations('dict')

Now, we can get the triples in the dataset in a dataframe of three columns: [subject, object, predicate] where the values inside the tuples represent the indices in entity2idx and relation2idx. The other available format is python list. The dataframe can be saved in csv file if output_dir is passed.

triples = loader.triples('list')

list of the convenience functions available:

RDFGraphDataset.num_entities()  
RDFGraphDataset.num_predicates()  
RDFGraphDataset.num_relations()  
RDFGraphDataset.num_attributes()  
RDFGraphDataset.num_attr_literal_pairs()  
RDFGraphDataset.num_triples()  
RDFGraphDataset.num_entity2literal_triples()  
RDFGraphDataset.num_entity2entity_triples()  
RDFGraphDataset.num_rdf_type_triples()  
RDFGraphDataset.predicates(format) # where format is one of ['dict', 'df', 'list']  
RDFGraphDataset.relations(format) # where format is one of ['dict', 'df', 'list']  
RDFGraphDataset.attributes(format) # where format is one of ['dict', 'df', 'list']  
RDFGraphDataset.entities(format) # where format is one of ['dict', 'df', 'list']  
RDFGraphDataset.attr_literal_pairs()  
RDFGraphDataset.triples(format) # where format is one of ['df', 'list']  
RDFGraphDataset.entity2entity_triples(format) # where format is one of ['df', 'list']  
RDFGraphDataset.entity2literal_triples(format) # where format is one of ['df', 'list']  
RDFGraphDataset.subjects(predicate)  
RDFGraphDataset.objects(predicate)  
RDFGraphDataset.predicates_freq()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rdf2srl-1.0.11-py3-none-any.whl (13.0 kB view details)

Uploaded Python 3

File details

Details for the file rdf2srl-1.0.11-py3-none-any.whl.

File metadata

  • Download URL: rdf2srl-1.0.11-py3-none-any.whl
  • Upload date:
  • Size: 13.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.19.1 setuptools/40.2.0 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.7.1

File hashes

Hashes for rdf2srl-1.0.11-py3-none-any.whl
Algorithm Hash digest
SHA256 a56497642245da696d6256e8cc1156d1276a786c85e95f4ce5bfbc2e2edcd035
MD5 b9e562ab868a5a0c82e1087d86d3f984
BLAKE2b-256 ab12311f1754dd105ef9896c63addd62f95d1d474d30a48233b33badd927165f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page