Skip to main content

Wraps several RDF schema solver tools

Project description

RDFSolve

Tests PyPI PyPI - Python Version PyPI - License Documentation Status

Extract RDF schemas from SPARQL endpoints and convert to multiple formats (VoID, LinkML, JSON-LD).

Installation

uv pip install rdfsolve

Quick Start

CLI

Extract schema and convert to multiple formats:

# Discover existing VoID metadata (fast)
rdfsolve discover --endpoint https://sparql.rhea-db.org/sparql

# Extract schema (uses discovered VoID if available)
rdfsolve extract --endpoint https://sparql.rhea-db.org/sparql \
  --output-dir ./output

# Export to different formats
rdfsolve export --void-file ./output/void_description.ttl \
  --format all --output-dir ./output

Extract Command Options:

# Force fresh generation (bypasses discovered VoID)
rdfsolve extract --endpoint URL --force-generate

# Custom naming and URIs
rdfsolve extract --endpoint URL \
  --dataset-name mydata \
  --void-base-uri "http://example.org/mydata/well-known/void"

# Filter specific graphs
rdfsolve extract --endpoint URL \
  --graph-uri http://example.org/graph1 \
  --graph-uri http://example.org/graph2

Export Formats:

  • csv - Schema patterns table
  • jsonld - JSON-LD representation
  • linkml - LinkML YAML schema
  • shacl - SHACL shapes for RDF validation
  • rdfconfig - RDF-config YAML files (model, prefix, endpoint)
  • coverage - Pattern frequency analysis
  • all - All formats (default)

Export with custom LinkML schema:

rdfsolve export --void-file void_description.ttl \
  --format linkml \
  --schema-name custom_schema \
  --schema-uri "http://example.org/schemas/custom" \
  --schema-description "Custom schema description"

Export SHACL shapes for RDF validation:

# Export closed SHACL shapes (strict validation)
rdfsolve export --void-file void_description.ttl \
  --format shacl \
  --shacl-closed \
  --shacl-suffix Shape

# Export open SHACL shapes (flexible validation)
rdfsolve export --void-file void_description.ttl \
  --format shacl \
  --shacl-open

SHACL (Shapes Constraint Language) shapes define constraints on RDF data and can be used to validate RDF instances against the extracted schema. Closed shapes only allow properties explicitly defined in the schema, while open shapes are more permissive.

Export RDF-config files:

rdfsolve export --void-file void_description.ttl \
  --format rdfconfig \
  --endpoint-url https://sparql.example.org/sparql \
  --graph-uri http://example.org/graph \
  --output-dir ./output

Creates a directory {dataset}_config/ containing:

  • model.yml - Class and property structure
  • prefix.yml - Namespace prefix definitions
  • endpoint.yml - SPARQL endpoint configuration

This structure is required by the rdf-config tool.

Count instances per class:

rdfsolve count --endpoint URL --output counts.csv

Service graph filtering:

By default, extract and count exclude Virtuoso system graphs and well-known URIs. Use --include-service-graphs to include them.

Python API

from rdfsolve.api import (
    generate_void_from_endpoint,
    load_parser_from_graph,
    count_instances_per_class,
    to_shacl_from_file,
    to_rdfconfig_from_file,
)

# Generate VoID from endpoint
void_graph = generate_void_from_endpoint(
    endpoint_url="https://sparql.example.org/",
    graph_uris=["http://example.org/graph"],
    void_base_uri="http://example.org/void",  # Custom partition URIs
)

# Load parser and extract schema
parser = load_parser_from_graph(void_graph)

# Export to different formats
schema_df = parser.to_schema()  # Pandas DataFrame
schema_jsonld = parser.to_jsonld()  # JSON-LD
linkml_yaml = parser.to_linkml_yaml(
    schema_name="my_schema",
    schema_base_uri="http://example.org/schemas/my_schema"
)

# Export to SHACL shapes for validation
shacl_ttl = parser.to_shacl(
    schema_name="my_schema",
    schema_base_uri="http://example.org/schemas/my_schema",
    closed=True,  # Closed shapes for strict validation
    suffix="Shape",  # Append "Shape" to class names
)

# Or use the convenience function
shacl_ttl = to_shacl_from_file(
    "void_description.ttl",
    schema_name="my_schema",
    closed=True,
)

# Export to RDF-config format
rdfconfig = to_rdfconfig_from_file(
    "void_description.ttl",
    endpoint_url="https://sparql.example.org/",
    graph_uri="http://example.org/graph",
)
# Save to {dataset}_config/ directory structure
import os
os.makedirs("dataset_config", exist_ok=True)
with open("dataset_config/model.yml", "w") as f:
    f.write(rdfconfig["model"])
with open("dataset_config/prefix.yml", "w") as f:
    f.write(rdfconfig["prefix"])
with open("dataset_config/endpoint.yml", "w") as f:
    f.write(rdfconfig["endpoint"])

# Count instances per class
class_counts = count_instances_per_class(
    "https://sparql.example.org/",
    graph_uris=["http://example.org/graph"],
)

Features

  • Extract RDF schemas from SPARQL endpoints using VoID partitions
  • Discover existing VoID metadata or generate fresh
  • Export to multiple formats: CSV, JSON-LD, LinkML, SHACL, RDF-config, coverage analysis
  • SHACL shapes generation for RDF data validation
  • RDF-config export for schema documentation (compatible with rdf-config tool)
  • Customizable dataset naming and VoID partition URIs
  • Service graph filtering (excludes Virtuoso system graphs by default)
  • Instance counting per class with optional sampling

Documentation

License

MIT License - see LICENSE for details.

Powered by the Bioregistry

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rdfsolve-0.0.1.tar.gz (718.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rdfsolve-0.0.1-py3-none-any.whl (67.3 kB view details)

Uploaded Python 3

File details

Details for the file rdfsolve-0.0.1.tar.gz.

File metadata

  • Download URL: rdfsolve-0.0.1.tar.gz
  • Upload date:
  • Size: 718.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rdfsolve-0.0.1.tar.gz
Algorithm Hash digest
SHA256 cae5107f05188799255acf21e0bfa1e39b1b2cc77da092eacc109d5efefcf172
MD5 0f8b1a059619924cc2fbbcbc8a3821dc
BLAKE2b-256 bca59821b4dab7d22471e76d13d50011bafcdd8cedb141c2b1998054a68af6e8

See more details on using hashes here.

Provenance

The following attestation bundles were made for rdfsolve-0.0.1.tar.gz:

Publisher: python-publish.yml on jmillanacosta/rdfsolve

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rdfsolve-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: rdfsolve-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 67.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rdfsolve-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3ff1eec77c57fe11c0cfbfaa9c0ed33db9e0fd8d462780d2db5c8182f7293f22
MD5 941044799cde0a57ed36eeecc7743eb8
BLAKE2b-256 6a1af7b0b9bc936d8b02c4ad72f0f76e8f2d5b24d4402bdd30b79912a0cad105

See more details on using hashes here.

Provenance

The following attestation bundles were made for rdfsolve-0.0.1-py3-none-any.whl:

Publisher: python-publish.yml on jmillanacosta/rdfsolve

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page