A tool for uploading RDF data to SPARQL endpoints

These details have not been verified by PyPI

Project links

Project description

RDF Uploader

Knowledge graph developers often need to work with various types of triple stores within the same projects. Each store has its own way of handling endpoints, authentication, and named graphs, which can complicate the upload process. RDF Uploader addresses this by offering a consistent interface for popular stores like MarkLogic, Blazegraph, Neptune, RDFox, and Stardog. Unlike typical RDFLib-based applications, such as those using RDFLib's Graph class that upload one triple at a time, RDF Uploader supports batch uploads and concurrency. This approach prevents server overload from large files, unlike using CURL, which might crash the server by dumping an entire multi-gigabyte file at once. Concurrency also boosts performance in clustered triple store environments by allowing multiple uploads simultaneously. While many stores offer high-throughput loading methods, these are often unique to each store and require direct server access to load from local files. RDF Uploader, using simple HTTP requests, avoids these complexities and dependencies, making it lightweight and easy to integrate into existing workflows, while eliminating the hassle of dealing with different endpoint implementations.

License MIT

Demo GIF

Features
Installation & Quick Start
Usage Guide
Configuration
Command Line Reference
Environment Variables
Programmatic Usage
License

Features

Ingest RDF data into SPARQL endpoints using asynchronous operations
Support for multiple RDF stores (MarkLogic, Blazegraph, Neptune, RDFox, and Stardog)
Authentication support for secure endpoints
Content type detection and customization
Concurrent uploads with configurable limits
Batching of RDF statements for efficient processing
Verbose output for detailed logging
Support for named graphs

Installation & Quick Start

Choose your preferred method:

pip

pip install rdf-uploader
rdf-uploader file.ttl --endpoint http://localhost:3030/dataset/sparql

pipx (without permanent installation)

pipx run rdf-uploader upload file.ttl --endpoint http://localhost:3030/dataset/sparql

Homebrew

The homebrew forumual for rdf-uploader lives in the private tap vladistan/homebrew-gizmos This separate tap is required because the package is still new and hasnt yet met the popularity and stability thresholds for inclusion in homebrew-core. Use the following commands to install it from the private tap.

brew tap vladistan/homebrew-gizmos
brew install rdf-uploader

# Quick test
rdf-uploader file.ttl --endpoint http://localhost:3030/dataset/sparql

Docker

docker run -v $(pwd):/data vladistan/rdf-uploader:latest /data/file.ttl --endpoint http://localhost:3030/dataset/sparql

With Environment Variables

export RDF_ENDPOINT=http://localhost:3030/dataset/sparql
rdf-uploader file.ttl

With .envrc File

Create .envrc with your configuration, then run:

# .envrc file content
export RDF_ENDPOINT="http://localhost:3030/dataset/sparql"

# Command to run
rdf-uploader file.ttl

Usage Guide

Basic Operations

Upload a single file:

rdf-uploader file.ttl --endpoint http://localhost:3030/dataset/sparql

Upload multiple files:

rdf-uploader file1.ttl file2.n3 --endpoint http://localhost:3030/dataset/sparql

Use a named graph:

rdf-uploader file.ttl --endpoint http://localhost:3030/dataset/sparql --graph http://example.org/graph

Authentication

With credentials:

rdf-uploader file.ttl --endpoint http://localhost:3030/dataset/sparql --username myuser --password mypass

Content Types & Format

Explicitly specify content type:

rdf-uploader file.ttl --content-type "text/turtle"

Supported formats (auto-detected by extension):

.ttl, .turtle: text/turtle
.nt: application/n-triples
.n3: text/n3
.nq, .nquads: application/n-quads
.rdf, .xml: application/rdf+xml
.jsonld: application/ld+json
.json: application/rdf+json
.trig: application/trig

Performance Options

Control concurrency: The --concurrent option allows you to specify the number of concurrent upload operations. For example, using --concurrent 10 will enable the uploader to process up to 10 files simultaneously, which can significantly speed up the upload process when dealing with multiple files.

rdf-uploader *.ttl --concurrent 10

Enable verbose output: The --verbose option provides detailed output during the upload process. This can be useful for debugging or monitoring the progress of the upload, as it will display additional information about each step the uploader takes.

rdf-uploader file.ttl --verbose

Set batch size: The --batch-size option lets you define the number of RDF statements to be included in each batch during the upload. For instance, --batch-size 5000 will group the RDF data into batches of 5000 statements, which can help manage memory usage and optimize performance for large datasets.

rdf-uploader file.ttl --batch-size 5000

Configuration

RDF Uploader offers three ways to configure parameters, with the following priority:

Command-line arguments (highest priority)
Environment variables (checked if CLI args not provided)
.envrc file (checked if environment variables not set)

Endpoint Types

The tool supports these endpoints with optimized handling:

marklogic
neptune
blazegraph
rdfox
stardog
generic (default)

Specify the endpoint type:

rdf-uploader file.ttl --type stardog

Endpoint-specific Variables

When an endpoint type is specified, type-specific variables take precedence:

# Generic endpoint (fallback)
export RDF_ENDPOINT=http://localhost:3030/dataset/sparql

# Type-specific endpoint (takes precedence when --type marklogic is used)
export MARKLOGIC_ENDPOINT=http://marklogic-server:8000/v1/graphs

Command Line Reference

Basic Syntax

rdf-uploader [OPTIONS] FILES...

Options Reference

Category	Option	Short	Description	Default
Files	`FILES...`		One or more RDF files to upload	(required)
Endpoint	`--endpoint`	`-e`	SPARQL endpoint URL	(required)
	`--type`	`-t`	Endpoint type	`generic`
	`--graph`	`-g`	Named graph to upload to	Default graph
	`--store-name`	`-s`	RDFox datastore name	(required for RDFox)
Auth	`--username`	`-u`	Username
	`--password`	`-p`	Password
Content	`--content-type`		Content type for RDF data	Auto-detected
Performance	`--concurrent`	`-c`	Max concurrent uploads	5
	`--batch-size`	`-b`	Triples per batch	1000
Output	`--verbose`	`-v`	Enable detailed output	`False`

Environment Variables

General Configuration

# Generic endpoint URL and auth
export RDF_ENDPOINT=http://localhost:3030/dataset/sparql
export RDF_USERNAME=myuser
export RDF_PASSWORD=mypass

Endpoint-specific Configuration

# MarkLogic
export MARKLOGIC_ENDPOINT=http://marklogic-server:8000/v1/graphs
export MARKLOGIC_USERNAME=mluser
export MARKLOGIC_PASSWORD=mlpass

# Neptune
export NEPTUNE_ENDPOINT=https://your-neptune-instance.amazonaws.com:8182/sparql
export NEPTUNE_USERNAME=neptuneuser
export NEPTUNE_PASSWORD=neptunepass

# Blazegraph
export BLAZEGRAPH_ENDPOINT=http://blazegraph-server:9999/blazegraph/sparql
export BLAZEGRAPH_USERNAME=bguser
export BLAZEGRAPH_PASSWORD=bgpass

# RDFox
export RDFOX_ENDPOINT=http://rdfox-server:12110/datastores/default/content
export RDFOX_USERNAME=rdfoxuser
export RDFOX_PASSWORD=rdfoxpass
export RDFOX_STORE_NAME=mystore

# Stardog
export STARDOG_ENDPOINT=https://your-stardog-instance:5820/database
export STARDOG_USERNAME=sduser
export STARDOG_PASSWORD=sdpass

Programmatic Usage

Use the library in your Python code:

from pathlib import Path
from rdf_uploader.uploader import upload_rdf_file
from rdf_uploader.endpoints import EndpointType

# With explicit parameters
await upload_rdf_file(
    file_path=Path("file.ttl"),
    endpoint="http://localhost:3030/dataset/sparql",
    endpoint_type=EndpointType.GENERIC,
    username="myuser",
    password="mypass"
)

# Using environment variables
await upload_rdf_file(
    file_path=Path("file.ttl"),
    endpoint_type=EndpointType.GENERIC
)

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.18.8

Sep 6, 2025

0.18.7

Sep 6, 2025

0.18.6

May 17, 2025

0.18.5

May 10, 2025

This version

0.18.3

May 9, 2025

0.18.2

Apr 19, 2025

0.18.0

Apr 19, 2025

0.17.5

Apr 19, 2025

0.17.4

Apr 19, 2025

0.17.3

Apr 18, 2025

0.17.2

Apr 18, 2025

0.17.0

Apr 18, 2025

0.16.3

Apr 18, 2025

0.16.2

Apr 18, 2025

0.16.0

Apr 18, 2025

0.15.7

Apr 18, 2025

0.15.6

Apr 18, 2025

0.15.0

Apr 13, 2025

0.1.0

Mar 20, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rdf_uploader-0.18.3.tar.gz (7.6 MB view details)

Uploaded May 9, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

rdf_uploader-0.18.3-py3-none-any.whl (13.1 kB view details)

Uploaded May 9, 2025 Python 3

File details

Details for the file rdf_uploader-0.18.3.tar.gz.

File metadata

Download URL: rdf_uploader-0.18.3.tar.gz
Upload date: May 9, 2025
Size: 7.6 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: python-httpx/0.28.1

File hashes

Hashes for rdf_uploader-0.18.3.tar.gz
Algorithm	Hash digest
SHA256	`f95d5205f66f4cb2f7824bd8c7d66896163d93b84cac7f3074a71014de4f8c33`
MD5	`225a690f620d2a0afed9894a0b57ca04`
BLAKE2b-256	`d00816472b568dd6c401275c32392b297305b74f88a072866da1eae5e5f70521`

See more details on using hashes here.

File details

Details for the file rdf_uploader-0.18.3-py3-none-any.whl.

File metadata

Download URL: rdf_uploader-0.18.3-py3-none-any.whl
Upload date: May 9, 2025
Size: 13.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: python-httpx/0.28.1

File hashes

Hashes for rdf_uploader-0.18.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`983a996383e33e72cd0169abb64272f6df05ec1b50f2b00cf2b498b8e7985c3c`
MD5	`6f1d3a6d0c05c91c7dc2a81908369171`
BLAKE2b-256	`ac231da7ad58a33f90a3be370fd03881c84359600c8ced12d8462d2531d429f9`

See more details on using hashes here.

rdf-uploader 0.18.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

RDF Uploader

Table of Contents

Features

Installation & Quick Start

pip

pipx (without permanent installation)

Homebrew

Docker

With Environment Variables

With .envrc File

Usage Guide

Basic Operations

Authentication

Content Types & Format

Performance Options

Configuration

Endpoint Types

Endpoint-specific Variables

Command Line Reference

Basic Syntax

Options Reference

Environment Variables

General Configuration

Endpoint-specific Configuration

Programmatic Usage

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes