Skip to main content

Synthetic RDF graph generator based on SHACL constraints.

Project description

RDFGraphGen: A Synthetic RDF Graph Generator based on SHACL Constraints

This is a Python package which can be used to generate synthetic RDF knowledge graphs, based on SHACL constraints.

The Shapes Constraint Language (SHACL) is a W3C standard which specifies ways to validate data in RDF graphs, by defining constraining shapes. However, even though the main purpose of SHACL is validation of existing RDF data, in order to solve the problem with the lack of available RDF datasets in multiple RDF-based application development processes, we envisioned and implemented a reverse role for SHACL: we use SHACL shape definitions as a starting point to generate synthetic data for an RDF graph.

The generation process involves extracting the constraints from the SHACL shapes, converting the specified constraints into rules, and then generating artificial data for a predefined number of RDF entities, based on these rules. The purpose of RDFGraphGen is the generation of small, medium or large RDF knowledge graphs for the purpose of benchmarking, testing, quality control, training and other similar purposes for applications from the RDF, Linked Data and Semantic Web domain.

Usage

The following function can be used to generate RDF data:

generate_rdf(input-shape.ttl, output-graph.ttl, number-of-entities)

  • input-shape.ttl is a Turtle file that contains SHACL shapes
  • output-graph.ttl is a Turtle file that will store the generated RDF entities
  • number-of-entities is the number of RDF entities to be generated

Installation

RDFGraphGen is available on PyPi: https://pypi.org/project/rdf-graph-gen/

To install it, use:

pip install rdf-graph-gen

After installation, this package can be used as a command line tool:

rdfgen input-shape.ttl output-graph.ttl number-of-entities

The parameters here are the same as the ones described above.

Examples

Examples of SHACL shapes based on Schema.org and other types, along with generated synthetic RDF graphs based on these shapes, can be found in the generated examples directory in this repo.

Publications

Remarks

  • A SHACL shape has to have a 'a sh:NodeShape' property and object in order to be recognized as a Node Shape.
  • sh:severity is ignored because it has no useful info.
  • SHACL Property Paths are not supported
  • sh:datatype can have many different values, not all are recognized.
  • sh:nodeKind is ignored
  • The triples generated based on properties with a sh:minCount constraint can sometimes have a smaller value than the defined minimum count. This is because sometimes the generator generates the same triple multiple times.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rdf_graph_gen-1.1.3.tar.gz (729.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rdf_graph_gen-1.1.3-py3-none-any.whl (733.0 kB view details)

Uploaded Python 3

File details

Details for the file rdf_graph_gen-1.1.3.tar.gz.

File metadata

  • Download URL: rdf_graph_gen-1.1.3.tar.gz
  • Upload date:
  • Size: 729.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.8.18

File hashes

Hashes for rdf_graph_gen-1.1.3.tar.gz
Algorithm Hash digest
SHA256 c374ab0895e2601f44d573bf4bc63414e0ccc13ef7d7e98ea6692fdf0fba638c
MD5 2b63506d210998e4f7bf5a0142d938d5
BLAKE2b-256 b1a99be89a316f7a67f80d690c2d7ec8f7939ff29fc8648175cbbb6fc25b9923

See more details on using hashes here.

File details

Details for the file rdf_graph_gen-1.1.3-py3-none-any.whl.

File metadata

  • Download URL: rdf_graph_gen-1.1.3-py3-none-any.whl
  • Upload date:
  • Size: 733.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.8.18

File hashes

Hashes for rdf_graph_gen-1.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 4baf0d3d7a10cdbc3d4fd307e772be4a0407258a8e4ed758588940665ca25f42
MD5 24fdcb1e98350df29ad0e3037a0b2855
BLAKE2b-256 70ae76bb7d62aab3136fcfb14eeb57c8c788ce21439223daea87b6989c1a45ea

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page