Python SHACL Validator
Project description
pySHACL
A Python validator for SHACL.
This is a pure Python module which allows for the validation of RDF graphs against Shapes Constraint Language (SHACL) graphs. This module uses the rdflib Python library for working with RDF and is dependent on the OWL-RL Python module for OWL2 RL Profile based expansion of data graphs.
This module is developed to adhere to the SHACL Recommendation:
Holger Knublauch; Dimitris Kontokostas. Shapes Constraint Language (SHACL). 20 July 2017. W3C Recommendation. URL: https://www.w3.org/TR/shacl/ ED: https://w3c.github.io/data-shapes/shacl/
Community for Help and Support
The SHACL community has a discord server for discussion of topics around SHACL and the SHACL specification.
Use this invitation link: https://discord.gg/RTbGfJqdKB to join the server
There is a #pyshacl channel for discussion of this python library, and you can ask for general SHACL help too.
Installation
Install with PIP (Using the Python3 pip installer pip3)
$ pip3 install pyshacl
Or in a python virtualenv (these example commandline instructions are for a Linux/Unix based OS)
$ python3 -m virtualenv --python=python3 --no-site-packages .venv
$ source ./.venv/bin/activate
$ pip3 install pyshacl
To exit the virtual enviornment:
$ deactivate
Command Line Use
For command line use: (these example commandline instructions are for a Linux/Unix based OS)
$ pyshacl -s /path/to/shapesGraph.ttl -m -i rdfs -a -j -f human /path/to/dataGraph.ttl
To validate multiple data graphs in combine mode (default):
$ pyshacl -s /path/to/shapesGraph.ttl /path/to/dataGraph1.ttl /path/to/dataGraph2.ttl
To validate multiple data graphs independently:
$ pyshacl --validate-each -s /path/to/shapesGraph.ttl /path/to/dataGraph1.ttl /path/to/dataGraph2.ttl
Where
-sis an (optional) path to the shapes graph to use-eis an (optional) path to an extra ontology graph to import-iis the pre-inferencing option-fis the ValidationReport output format (human= human-readable validation report)--validate-eachvalidates each data graph independently when multiple inputs are provided-menable the meta-shacl feature-aenable SHACL Advanced Features-jenable SHACL-JS Features (ifpyshacl[js]is installed)
System exit codes are:
0 = DataGraph is Conformant
1 = DataGraph is Non-Conformant
2 = The validator encountered a RuntimeError (check stderr output for details)
3 = Not-Implemented; The validator encountered a SHACL feature that is not yet implemented.
Full CLI Usage options:
$ pyshacl -h
$ python3 -m pyshacl -h
usage: pyshacl [-h] [-s [SHACL]] [-e [ONT]] [-i {none,rdfs,owlrl,both}] [-m]
[-im] [-a] [-j] [-it] [--abort] [--allow-info] [-w]
[--max-depth [MAX_DEPTH]] [-d] [--validate-each]
[-f {human,table,turtle,xml,json-ld,nt,n3}]
[-df {auto,turtle,xml,json-ld,nt,n3}]
[-sf {auto,turtle,xml,json-ld,nt,n3}]
[-ef {auto,turtle,xml,json-ld,nt,n3}] [-V] [-o [OUTPUT]]
[--server]
DataGraph [DataGraph ...]
PySHACL 0.27.0 command line tool.
positional arguments:
DataGraph The file(s) containing the Target Data Graph.
optional arguments:
--server Ignore all the rest of the options, start the HTTP Server.
-h, --help show this help message and exit
-s [SHACL], --shacl [SHACL]
A file containing the SHACL Shapes Graph.
-e [ONT], --ont-graph [ONT]
A file path or URL to a document containing extra
ontological information. RDFS and OWL definitions from this
are used to inoculate the DataGraph.
-i {none,rdfs,owlrl,both}, --inference {none,rdfs,owlrl,both}
Choose a type of inferencing to run against the Data
Graph before validating.
-m, --metashacl Validate the SHACL Shapes graph against the shacl-
shacl Shapes Graph before validating the Data Graph.
-im, --imports Allow import of sub-graphs defined in statements with
owl:imports.
-a, --advanced Enable features from the SHACL Advanced Features
specification.
-j, --js Enable features from the SHACL-JS Specification.
-it, --iterate-rules Run Shape's SHACL Rules iteratively until the
data_graph reaches a steady state.
--abort Abort on first invalid data.
--allow-info, --allow-infos
Shapes marked with severity of Info will not cause
result to be invalid.
-w, --allow-warning, --allow-warnings
Shapes marked with severity of Warning or Info will
not cause result to be invalid.
--max-depth [MAX_DEPTH]
The maximum number of SHACL shapes "deep" that the
validator can go before reaching an "endpoint"
constraint.
-d, --debug Output additional verbose runtime messages.
--validate-each Validate each data graph independently when multiple
inputs are provided.
--focus [FOCUS] Optional IRIs of focus nodes from the DataGraph, the shapes will
validate only these node. Comma-separated list.
--shape [SHAPE] Optional IRIs of a NodeShape or PropertyShape from the SHACL
ShapesGraph, only these shapes will be used to validate the
DataGraph. Comma-separated list.
-f {human,table,turtle,xml,json-ld,nt,n3}, --format {human,table,turtle,xml,json-ld,nt,n3}
Choose an output format. Default is "human".
-df {auto,turtle,xml,json-ld,nt,n3}, --data-file-format {auto,turtle,xml,json-ld,nt,n3}
Explicitly state the RDF File format of the input
DataGraph file. Default="auto".
-sf {auto,turtle,xml,json-ld,nt,n3}, --shacl-file-format {auto,turtle,xml,json-ld,nt,n3}
Explicitly state the RDF File format of the input
SHACL file. Default="auto".
-ef {auto,turtle,xml,json-ld,nt,n3}, --ont-file-format {auto,turtle,xml,json-ld,nt,n3}
Explicitly state the RDF File format of the extra
ontology file. Default="auto".
-V, --version Show PySHACL version and exit.
-o [OUTPUT], --output [OUTPUT]
Send output to a file (defaults to stdout).
--server Ignore all the rest of the options, start the HTTP
Server. Same as `pyshacl_server`.
Python Module Use
For basic use of this module, you can just call the validate function of the pyshacl module like this:
from pyshacl import validate
data_graph = "some-data.ttl"
shacl_graph = "some-shacl.ttl"
ont_graph = "some-ontology.ttl"
r = validate(data_graph,
shacl_graph=shacl_graph,
ont_graph=ont_graph,
inference='rdfs',
abort_on_first=False,
allow_infos=False,
allow_warnings=False,
meta_shacl=False,
advanced=False,
js=False,
debug=False)
conforms, results_graph, results_text = r
To validate multiple data graphs in combine mode (default):
from pyshacl import validate
data_graphs = ["data1.ttl", "data2.ttl", "data3.ttl"]
conforms, results_graph, results_text = validate(data_graphs, shacl_graph="shapes.ttl")
To validate each data graph independently:
from pyshacl import validate_each
data_graphs = ["data1.ttl", "data2.ttl", "data3.ttl"]
results = validate_each(data_graphs, shacl_graph="shapes.ttl")
for graph_id, (conforms, results_graph, results_text) in results.items():
print(graph_id, conforms)
Where:
data_graphis an rdflibGraphobject, file path, or a sequence of those to be validatedshacl_graphis an rdflibGraphobject or file path or Web URL of the graph containing the SHACL shapes to validate with, or None if the SHACL shapes are included in the data_graph.ont_graphis an rdflibGraphobject or file path or Web URL a graph containing extra ontological information, or None if not required. RDFS and OWL definitions from this are used to inoculate the DataGraph.inferenceis a Python string value to indicate whether or not to perform OWL inferencing expansion of thedata_graphbefore validation. Options are 'rdfs', 'owlrl', 'both', or 'none'. The default is 'none'.abort_on_first(optional)boolvalue to indicate whether or not the program should abort after encountering the first validation failure or to continue. Default is to continue.allow_infos(optional)boolvalue, Shapes marked with severity of Info will not cause result to be invalid.allow_warnings(optional)boolvalue, Shapes marked with severity of Warning or Info will not cause result to be invalid.meta_shacl(optional)boolvalue to indicate whether or not the program should enable the Meta-SHACL feature. Default is False.advanced: (optional)boolvalue to enable SHACL Advanced Featuresjs: (optional)boolvalue to enable SHACL-JS Features (ifpyshacl[js]is installed)debug(optional)boolvalue to indicate whether or not the program should emit debugging output text, including violations that didn't lead to non-conformance overall. So when debug is True don't judge conformance by absense of violation messages. Default is False.
Some other optional keyword variables available on the validate function:
data_graph_format: Override the format detection for the given data graph source file.shacl_graph_format: Override the format detection for the given shacl graph source file.ont_graph_format: Override the format detection for the given extra ontology graph source file.iterate_rules: Iterate SHACL Rules until steady state is found (only works with advanced mode).do_owl_imports: Enable the feature to allow the import of subgraphs usingowl:importsfor the shapes graph and the ontology graph. Note, you explicitly cannot use this on the target data graph.serialize_report_graph: Convert the report results_graph into a serialised representation (for example, 'turtle')check_dash_result: Check the validation result against the given expected DASH test suite result.multi_data_graphs_mode: When passing a sequence of data graphs, choose"combine"or"validate_each".
Return value:
- a three-component
tuplecontaining:conforms: abool, indicating whether thedata_graphconforms to theshacl_graphresults_graph: aGraphobject built according to the SHACL specification's Validation Report schemeresults_text: python string representing a verbose textual representation of the Validation Report
Python Module Call
You can get an equivalent of the Command Line Tool using the Python3 executable by doing:
$ python3 -m pyshacl
Errors
Under certain circumstances pySHACL can produce a Validation Failure. This is a formal error defined by the SHACL specification and is required to be produced as a result of specific conditions within the SHACL graph that leads to the inability to complete the validation.
If the validator produces a Validation Failure, the results_graph variable returned by the validate() function will be an instance of ValidationFailure.
See the message attribute on that instance to get more information about the validation failure.
Other errors the validator can generate:
ShapeLoadError: This error is thrown when a SHACL Shape in the SHACL graph is in an invalid state and cannot be loaded into the validation engine.ConstraintLoadError: This error is thrown when a SHACL Constraint Component is in an invalid state and cannot be loaded into the validation engine.ReportableRuntimeError: An error occurred for a different reason, and the reason should be communicated back to the user of the validator.RuntimeError: The validator encountered a situation that caused it to throw an error, but the reason does not concern the user.
Unlike ValidationFailure, these errors are not passed back as a result by the validate() function, but thrown as exceptions by the validation engine and must be
caught in a try ... except block.
In the case of ShapeLoadError and ConstraintLoadError, see the str() string representation of the exception instance for the error message along with a link to the relevant section in the SHACL spec document.
Focus Node Filtering, and Shape Selection
PySHACL v0.27.0 and above has two powerful new features:
- Focus Node Filtering
- You can pass in a list of focus nodes to the validator, and it will only validate those focus nodes.
- Note, you still need to use a SHACL ShapesGraph, and the Shapes still need to target the focus nodes.
- This feature will filter the Shapes' targeted focus nodes to include only those that are in the list of specified focus nodes.
- SHACL Shape selection
- You can pass in a list of SHACL Shapes to the validator, and it will use only those Shapes for validation.
- This is useful for testing new shapes in your shapes graph, or for many other procedure-driven use cases.
- Combined Shape Selection with Focus Node filtering
- The combination of the above two new features is especially powerful.
- If you give the validator a list of Shapes to use, and a list of focus nodes, the validator will operate in a highly-targeted mode, it feeds those focus nodes directly into those given Shapes for validation.
- In this mode, the selected SHACL Shape does not need to specify any focus-targeting mechanisms of its own.
SPARQL Remote Graph Mode
PySHACL now has a built-in SPARQL Remote Graph Mode, which allows you to validate a data graph that is stored on a remote server.
- In this mode, PySHAL operates strictly in read-only mode, and does not modify the remote data graph.
- Some features are disabled when using the SPARQL Remote Graph Mode:
- "rdfs" and "owl" inferencing is not allowed (because the remote graph is read-only, it cannot be expanded)
- Extra Ontology file (Inoculation or Mix-In mode) is disabled (because the remote graph is read-only)
- SHACL Rules (Advanced mode SPARQL-Rules) are not allowed (because the remote graph is read-only)
- All SHACL-JS features are disabled (this is not safe when operating on a remote graph)
- "inplace" mode is disabled (actually all operations on the remote data graph are inherently performed in-place)
Inference and Rules
PySHACL can perform inference - creation of new data using rules - according to the SHACL Advanced Features - Rules specification.
The shacl_rules function can be used like this:
from pyshacl import shacl_rules
data_graph = "some-data.ttl"
shacl_graph = "some-shacl.ttl"
output_graph = shacl_rules(data_graph, shacl_graph=shacl_graph, advanced=True)
In the code above, the output_graph will contain the original RDF triples in the data_graph as well as new triples generated by the shacl_graph.
See the example file examples/rules_inference.py for a working example of PySHACL performing two kinds of SHACL inference.
Integrated OpenAPI-3.0-compatible HTTP REST Service
PySHACL now has a built-in validation service, exposed via an OpenAPI3.0-compatible REST API.
Due to the additional dependencies required to run, this feature is an optional extra.
You must first install PySHACL with the http extra option enabled:
$ pip3 install -U pyshacl[http]
When that is installed, you can start the service using the by executing the CLI entrypoint:
$ pyshacl --server
# or
$ pyshacl_server
# or
$ python3 -m pyshacl server
# or
$ docker run --rm -e PYSHACL_SERVER=TRUE -i -t docker.io/ashleysommer/pyshacl:latest
By default, this will run the service on localhost address 127.0.0.1 on port 8099.
To view the SwaggerUI documentation for the service, navigate to http://127.0.0.1:8099/docs/swagger and for the ReDoc version, go to http://127.0.0.1:8099/docs/redoc.
To view the OpenAPI3 schema see http://127.0.0.1:8099/docs/openapi.json
Configuring the HTTP REST Service
- You can force PySHACL CLI to start up in HTTP Server mode by passing environment variable
PYSHACL_SERVER=TRUE. This is useful in a containerised service, where you will only be running PySHACL in this mode. PYSHACL_SERVER_LISTEN=1.2.3.4listen on a different IP Address or hostnamePYSHACL_SERVER_PORT=8080listen on given different TCP PORTPYSHACL_SERVER_HOSTNAME=example.orgwhen you are hosting the server behind a reverse-proxy or in a containerised environment, use this so PySHACL server knows what your externally facing hostname is
Windows CLI
Pyinstaller can be
used to create an
executable for Windows that has the same characteristics as the Linux/Mac
CLI program.
The necessary .spec file is already included in pyshacl/pyshacl-cli.spec.
The pyshacl-cli.spec PyInstaller spec file creates a .exe for the
pySHACL Command Line utility. See above for the pySHACL command line util usage instructions.
See the PyInstaller installation guide for info on how to install PyInstaller for Windows.
Once you have pyinstaller, use pyinstaller to generate the pyshacl.exe CLI file like so:
$ cd src/pyshacl
$ pyinstaller pyshacl-cli.spec
This will output pyshacl.exe in the dist directory in src/pyshacl.
You can now run the pySHACL Command Line utility via pyshacl.exe.
See above for the pySHACL command line util usage instructions.
Docker
Pull the official docker image from Dockerhub:
docker pull docker.io/ashleysommer/pyshacl:latest
Or build the image yourself, from the PySHACL repository with docker build . -t pyshacl.
You can now run PySHACL inside a container; but you need to mount the data you want to validate.
For example, to validate graph.ttl against shacl.ttl, run :
docker run --rm -i -t --mount type=bind,src=`pwd`,dst=/data pyshacl -s /data/shacl.ttl /data/graph.ttl
Compatibility
PySHACL is a Python3 library. For best compatibility use Python v3.8 or greater. Python3 v3.7 or below is not supported and this library does not work on Python v2.7.x or below.
PySHACL is a PEP518 & PEP517 project, it uses pyproject.toml and poetry to manage dependencies, build and install.
For best compatibility when installing from PyPI with pip, upgrade to pip v20.0.2 or above.
- If you're on Ubuntu 18.04 or older, you will need to run
sudo pip3 install --upgrade pipto get the newer version.
Features
A features matrix is kept in the FEATURES file.
Changelog
A comprehensive changelog is kept in the CHANGELOG file.
Benchmarks
This project includes a script to measure the difference in performance of validating the same source graph that has been inferenced using each of the four different inferencing options. Run it on your computer to see how fast the validator operates for you.
License
This repository is licensed under Apache License, Version 2.0. See the LICENSE deed for details.
Contributors
See the CONTRIBUTORS file.
Citation
DOI: 10.5281/zenodo.4750840 (For all versions/latest version)
Contacts
Lead Developer
Ashley Sommer
Software Engineer
Department of Climate Change, Energy, the Environment and Water
Brisbane, Qld, Australia
Ashley.Sommer@dcceew.gov.au
https://orcid.org/0000-0003-0590-0131
Support developer
Nicholas Car
Data Architect
KurrawongAI
Brisbane, Qld, Australia
nick@kurrawong.ai
http://orcid.org/0000-0002-8742-7730
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pyshacl-0.31.0.tar.gz.
File metadata
- Download URL: pyshacl-0.31.0.tar.gz
- Upload date:
- Size: 1.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
327950875a5bb0d1a15c246a8a272b2dbf6bc9b96e28cfa8fdbfa4d73aadc0ba
|
|
| MD5 |
f56c180ce080bdc15b9714d41204fa2e
|
|
| BLAKE2b-256 |
612d8eaada41b9b57c028a54494688e45cfeefd6756098a6bf1bfa2dd9470cdf
|
File details
Details for the file pyshacl-0.31.0-py3-none-any.whl.
File metadata
- Download URL: pyshacl-0.31.0-py3-none-any.whl
- Upload date:
- Size: 1.3 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5cae2184401d956b67deebb00e3c78ab7052784741a730e52e309e33c8a0b9a5
|
|
| MD5 |
e5e75b2a681be2ac27968f7ef67296cc
|
|
| BLAKE2b-256 |
3f3bebd7c9595fcdf176555aaf2fd2254f4d890658334ca3556b611e579f8294
|