Skip to main content

No project description provided

Project description

<img source="goku_rdf_slurp.png" width=10% height=10%>

RDFIngest

tests Coverage Status PyPI version license: GPL v3

RDFIngest - A simple tool for ingesting local and remote RDF data sources into a triplestore.

WARNING: This project is in an early stage of development and should be used with caution.

Requirements

  • Python >= 3.11

Installation

RDFIngest is availabe on PyPI:

pip install rdfingest

Also the RDFIngest CLI can be installed with pipx:

pipx install rdfingest

For installation from source either use poetry or run pip install . from the package folder.

Usage

RDFIngest reads two YAML files:

  • a config file for obtaining triplestore credentials and
  • a registry which defines the RDF sources to be ingested.

Example config:

service:
  endpoint: "https://sometriplestore.endpoint"
  user: "admin"
  password: "supersecretpassword123"

Example registry:

graphs:
  - source: https://someremote.ttl
    graph_id: https://somenamedgraph.id

  - source: [
    somelocal.ttl,
    https://someotherremote.ttl
    ]
    graph_id: https://someothernamedgraph.id
    
  - source: https://someremote.trig
  
  - source: [
    https://someotherremote.trig,
    someotherlocal.ttl,
    yetanotherremote.ttl	
    ]
    graph_id: https://yetanothernamedgraph.id

RDFIngest parses all registered RDF sources and ingests the data as named graphs into the specified triplestore by executing POST requests for every source.

By default also a SPARQL DROP operation is run for every Graph ID before POSTing.

For contextless RDF sources a graph_id is required, RDF Datasets/Quad formats obviously do not require a graph_id field.

For Datasets, the default graph (at least for now) is ignored. Running automated DROP and/or POST operations on a remote default graph is considered somewhat dangerous.

Namespaces are one honking great idea -- let's do more of those!

The tool accepts both local and remote RDF data sources.

Entry example

Consider the following entry:

graphs:
 - source: [
    https://someremote.trig,
    somelocal.ttl,
    anotherremote.ttl	
    ]
    graph_id: https://somenamedgraph.id/

In this case every named graph in the Dataset https://someremote.trig is ingested using their respective named graph identifiers, somelocal.ttl and anotherremote.ttl are ingested into a named graph https://somenamedgraph.id/.

CLI

Run the rdfingest command.

rdfingest --config ./config.yaml --registry ./registry.yaml

Default values for config and registry are ./config.yaml and ./registry.yaml.

Also see rdfingest --help.

RDFIngest class

Point an RDFIngest instance to a config file and a registry and invoke run_ingest.

rdfingest = RDFIngest(
	config="./config.yaml"
	registry="./registry.yaml", 
	drop=True,
	debug=False
)

rdfingest.run_ingest()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rdfingest-0.1.2.tar.gz (20.9 kB view hashes)

Uploaded Source

Built Distribution

rdfingest-0.1.2-py3-none-any.whl (22.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page