Synthetic RDF graph generator based on SHACL shapes.
Project description
RDFGraphGen: A Synthetic RDF Graph Generator based on SHACL Shapes
This is a Python package which can be used to generate synthetic RDF knowledge graphs, based on SHACL shapes.
The Shapes Constraint Language (SHACL) is a W3C standard which specifies ways to validate data in RDF graphs, by defining constraining shapes. However, even though the main purpose of SHACL is validation of existing RDF data, in order to solve the problem with the lack of available RDF datasets in multiple RDF-based application development processes, we envisioned and implemented a reverse role for SHACL: we use SHACL shape definitions as a starting point to generate synthetic data for an RDF graph.
The generation process involves extracting the constraints from the SHACL shapes, converting the specified constraints into rules, and then generating artificial data for a predefined number of RDF entities, based on these rules. The purpose of RDFGraphGen is the generation of small, medium or large RDF knowledge graphs for the purpose of benchmarking, testing, quality control, training and other similar purposes for applications from the RDF, Linked Data and Semantic Web domain.
Usage
The following function can be used to generate RDF data:
generate_rdf(input-shape.ttl, output-graph.ttl, scale-factor)
- input-shape.ttl is a Turtle file that contains SHACL shapes
- output-graph.ttl is a Turtle file that will store the generated RDF entities
- scale-factor determines the size of the generated RDF graph
Installation
RDFGraphGen is available on PyPi: https://pypi.org/project/rdf-graph-gen/
To install it, use:
pip install rdf-graph-gen
After installation, this package can be used as a command line tool:
rdfgen input-shape.ttl output-graph.ttl scale-factor
There are also some optional parameters. You can find out more by using the:
rdfgen --help
Examples
Examples of SHACL shapes based on Schema.org and other types, along with generated synthetic RDF graphs based on these shapes, can be found in the generated examples directory in this repo.
Publications
- (preprint) Marija Vecovska, Milos Jovanovik. "RDFGraphGen: A Synthetic RDF Graph Generator based on SHACL Constraints". arXiv:2407.17941.
Remarks
- A SHACL shape has to have a 'a sh:NodeShape' property and object in order to be recognized as a Node Shape.
sh:severityis ignored because it has no useful info.- Only predicate paths are supported at this time.
- Most common
sh:datatypescenarios are supported. - Currently
sh:nodeKindis ignored.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rdf_graph_gen-1.1.5.tar.gz.
File metadata
- Download URL: rdf_graph_gen-1.1.5.tar.gz
- Upload date:
- Size: 731.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.8.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bcf9eec072afa18f7f60b2d12adcda91c34a358b36a12447258b262d4feb86e6
|
|
| MD5 |
c1af8db8207c1448cf919484899d0d6f
|
|
| BLAKE2b-256 |
a70ee4e27c8070ed3c829322283f6e872acd84c017350ad04cef216f6e25a87a
|
File details
Details for the file rdf_graph_gen-1.1.5-py3-none-any.whl.
File metadata
- Download URL: rdf_graph_gen-1.1.5-py3-none-any.whl
- Upload date:
- Size: 735.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.8.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
265d73aca0dbde72de9c1ac268458fe236df240c94c983bdf6ae696e65f8b63c
|
|
| MD5 |
a373499e6567f24b89b9e1062a9a0150
|
|
| BLAKE2b-256 |
240a4049b94dcfc526beb46de11405d1bf6556cc2e228a083c3866878c9559d3
|