Ontology Maintenance and Release Tool
Project description
ontology-toolkit
Maintain version and dependency info in RDF ontologies.
Installation
To install the toolkit, run pip install .
from the root directory, which
will install the onto_tool
command and all its dependencies into your environment.
$ onto_tool -h
usage: onto_tool [-h] {update,export,bundle,graphic} ...
Ontology toolkit.
positional arguments:
{update,export,bundle,graphic}
sub-command help
update Update versions and dependencies
export Export ontology
bundle Bundle ontology for release
graphic Create PNG graphic and dot file from OWL files
optional arguments:
-h, --help show this help message and exit
Sub-Commands
Update
The update
sub-command modifies ontology version and dependency information
$ onto_tool update -h
usage: onto_tool update [-h] [-f {xml,turtle,nt} | -i] [--debug] [-o OUTPUT]
[-b [{all,strict}]] [--retain-definedBy]
[--versioned-definedBy] [-v SET_VERSION]
[--version-info [VERSION_INFO]]
[-d DEPENDENCY VERSION]
[ontology [ontology ...]]
positional arguments:
ontology Ontology file or directory containing OWL files
optional arguments:
-h, --help show this help message and exit
-f {xml,turtle,nt}, --format {xml,turtle,nt}
Output format
-i, --in-place Overwrite each input file with update, preserving
format
--debug Emit verbose debug output
-o OUTPUT, --output OUTPUT
Path to output file. Will be ignored if --in-place is
specified.
-b [{all,strict}], --defined-by [{all,strict}]
Add rdfs:isDefinedBy to every resource defined. If the
(default) "strict" argument is provided, only
owl:Class, owl:ObjectProperty, owl:DatatypeProperty,
owl:AnnotationProperty and owl:Thing entities will be
annotated. If "all" is provided, every entity that has
any properties other than rdf:type will be annotated.
Will override any existing rdfs:isDefinedBy
annotations on the affected entities unless --retain-
definedBy is specified.
-v SET_VERSION, --set-version SET_VERSION
Set the version of the defined ontology
--version-info [VERSION_INFO]
Adjust versionInfo, defaults to "Version X.x.x"
-d DEPENDENCY VERSION, --dependency-version DEPENDENCY VERSION
Update the import of DEPENDENCY to VERSION
Export
The export
sub-command will transform the ontology into the desired format, and remove version information, as required by tools such as Top Braid Composer.
usage: onto_tool export [-h] [-f {xml,turtle,nt} | -c CONTEXT] [--debug]
[-o OUTPUT] [-s] [-m IRI VERSION] [-b [{all,strict}]]
[--retain-definedBy] [--versioned-definedBy]
[ontology [ontology ...]]
positional arguments:
ontology Ontology file or directory containing OWL files
optional arguments:
-h, --help show this help message and exit
-f {xml,turtle,nt}, --format {xml,turtle,nt}
Output format
-c CONTEXT, --context CONTEXT
Export as N-Quads in CONTEXT.
--debug Emit verbose debug output
-o OUTPUT, --output OUTPUT
Path to output file.
-s, --strip-versions Remove versions from imports.
-m IRI VERSION, --merge IRI VERSION
Merge all inputs into a single ontology with the given
IRI and version
-b [{all,strict}], --defined-by [{all,strict}]
Add rdfs:isDefinedBy to every resource defined. If the
(default) "strict" argument is provided, only
owl:Class, owl:ObjectProperty, owl:DatatypeProperty,
owl:AnnotationProperty and owl:Thing entities will be
annotated. If "all" is provided, every entity that has
any properties other than rdf:type will be annotated.
--retain-definedBy When merging ontologies, retain existing values of
rdfs:isDefinedBy
Graphic
The graphic
sub-command will create either
- a comprehensive diagram showing ontology modules together with classes, object properties and individuals together with the path of imports, or (if the 'wee' option is selected) a simple diagram of the ontology import hierarchy, or
- a diagram of the use of classes and object and data properties in a triple store.
Graphics are exported both as png
files and also as a dot
file. This dot
file can be used with Graphviz or with web tools such as Dot Viewer
usage: onto_tool graphic [-h] [-e ENDPOINT] [--schema | --data] [--debug]
[-o OUTPUT] [--show-shacl]
[--instance-limit INSTANCE_LIMIT]
[--predicate-threshold PREDICATE_THRESHOLD]
[--include [INCLUDE [INCLUDE ...]] |
--include-pattern [INCLUDE_REGEX [INCLUDE_REGEX ...]]
| --exclude [EXCLUDE [EXCLUDE ...]] |
--exclude-pattern
[EXCLUDE_REGEX [EXCLUDE_REGEX ...]]] [-v VERSION]
[-w]
[ontology [ontology ...]]
positional arguments:
ontology Ontology file, directory or name pattern
optional arguments:
-h, --help show this help message and exit
-e ENDPOINT, --endpoint ENDPOINT
URI of SPARQL endpoint to use to gather data
--schema Generate ontology import graph (default)
--data Analyze instances for types and links
--debug Emit verbose debug output
-o OUTPUT, --output OUTPUT
Output directory for generated graphics
--show-shacl Attempts to discover which classes and properties have
corresponding SHACL shapes and colors them green on
the graph. This detection relies on the presence of
sh:targetClass targeting, and can be confused by
complex logical shapes or Advanced SHACL features such
as SPARQL queries.
-v VERSION, --version VERSION
Version to place in graphic
-w, --wee a version of the graphic with only core information
about ontology and imports
Sampling Limits:
--instance-limit INSTANCE_LIMIT
Specify a limit on how many triples to consider that
use any one predicate to find (default 500000). This
option may result in an incomplete version of the
diagram, missing certain links.
--predicate-threshold PREDICATE_THRESHOLD
Ignore predicates which occur fewer than
PREDICATE_THRESHOLD times (default 10)
Filters (only one can be used):
--include [INCLUDE [INCLUDE ...]]
If specified for --schema, only ontologies matching
the specified URIs will be shown in full detail. If
specified with --data, only triples in the named
graphs mentioned will be considered (this also
excludes any triples in the default graph).
--include-pattern [INCLUDE_REGEX [INCLUDE_REGEX ...]]
If specified for --schema, only ontologies matching
the specified URI pattern will be shown in full
detail. If specified with --data, only triples in the
named graphs matching the pattern will be considered
(this also excludes any triples in the default graph).
For large graphs this option is significantly slower
than using --include.
--exclude [EXCLUDE [EXCLUDE ...]]
If specified for --schema, ontologies matching the
specified URIs will be omitted from the graph. If
specified with --data, triples in the named graphs
mentioned will be excluded (this also excludes any
triples in the default graph).
--exclude-pattern [EXCLUDE_REGEX [EXCLUDE_REGEX ...]]
If specified for --schema, ontologies matching the
specified URI pattern will be omitted from the graph.
If specified with --data, triples in the named graphs
matching the pattern will be ignored (this also
excludes any triples in the default graph). For large
graphs this option is significantly slower than using
--exclude.
Bundle
The bundle
sub-command supports creating an ontology deployment containing both RDF and non-RDF artifacts for delivery or web hosting.
$ onto_tool bundle -h
usage: onto_tool bundle [-h] [--debug] [-v VARIABLE VALUE] bundle
positional arguments:
bundle JSON or YAML bundle definition
optional arguments:
-h, --help show this help message and exit
--debug Emit verbose debug output
-v VARIABLE VALUE, --variable VARIABLE VALUE
Set value of VARIABLE to VALUE
The bundle definition is either YAML or JSON, and contains the following sections:
Variable definition
variables:
name: "gist"
version: "X.x.x"
input: "."
rdf-toolkit: "{input}/tools/rdf-toolkit.jar"
output: "{name}{version}_webDownload"
Variables are initialized with the default values provided, but can be overriden via the --variable
command line option.
Values can reference other values using the {name}
template syntax.
Tool definition
All tools require a name
by which they are referenced in transform
actions. Three different tool types are supported:
- Java tools (
type: "Java"
) require a path to the executable Jar file specified via thejar
option, and a list ofarguments
that will be applied to each file processed. TheinputFile
andoutputFile
variables will be bound during execution, but other variables can be used to construct the arguments. tools:- name: "serializer" type: "Java" jar: "{rdf-toolkit}" arguments: - "-tfmt" - "rdf-xml" - "-sdt" - "explicit" - "-dtd" - "-ibn" - "-s" - "{inputFile}" - "-t" - "{outputFile}"
- Shell tools (
type: "shell"
) execute a command specified via a list ofarguments
that will be applied to each file processed. TheinputFile
andoutputFile
variables will be bound during execution, but other variables can be used to construct the arguments. tools:tools: - name: "java_version" type: "shell" arguments: - "java" - "-version"
- SPARQL tools apply a SPARQL Update query to each input file and serialize the resulting graph into the
output file. RDF format is preserved unless overridden with the
format
option. If the query is specified inline, template substitution will be applied to it, so bundle variables can be used, but double braces ({{
instead of{
,}}
instead of}
) have to be used to escape actual braces.- name: "add-language-en" type: "sparql" query: > prefix skos: <http://www.w3.org/2004/02/skos/core#> DELETE {{ ?subject skos:prefLabel ?nolang . }} INSERT {{ ?subject skos:prefLabel ?withlang }} where {{ ?subject skos:prefLabel ?nolang . FILTER(lang(?nolang) = '') BIND(STRLANG(?nolang, '{lang}') as ?withlang) }}
Actions
Actions are executed in the order they are listed. Each action must have an action
attribute,
and any action can contain a message
attribute, the contents of which will be
emitted as a INFO
-level log message prior to the execution of the action.
Basic File Manipulation
mkdir
, which requires adirectory
attribute to specify the path of the directory to be created (only if it doesn't already exist)copy
, which copies files into the bundle, and supports the following arguments:source
,target
andincludes
- ifincludes
is not present,source
andtarget
are both assumed to be file paths to a single file. Ifincludes
is provided,source
andtarget
are assumed to be directories, and each member of theincludes
list a glob pattern inside thesource
directory.rename
- If provided, must containfrom
andto
attributes. When specified, each file is renamed as it is copied, wherefrom
is treated as a Python regular expression applied to the base name of the source file, andto
is the substitution string which replaces it in the name of the target file. Backreferences are available for capturing groups, e.g.rename: from: "(.*)\\.owl" to: "\\g<1>{version}.owl"
will add a version number to the base name of each.owl
file. Further documentation on Python regular expression replace functionality can be found here.replace
- If provided, must containfrom
andto
attributes. When specified, each file is processed after being copied, and each instance of thefrom
pattern is replaced withto
string in the file contents. Python regular expression syntax and backreferences are supported as shown in therename
documentation.
move
, which moves files according the provided options, which are identical to the ones supported bycopy
.
RDF Transformation
definedBy
, which inspects each input file to identify a single defined ontology, and then adds ardfs:isDefinedBy
property to everyowl:Class
,owl:ObjectProperty
,owl:DatatypeProperty
andowl:AnnotationProperty
defined in the file referencing the identified ontology. Existingrdfs:isDefinedBy
values are removed prior to the addition. Input and output file specification options are identical to those used by thecopy
action.export
, which functions similarly to the command-line export functionality, gathering one or more input ontologies and exporting them as a single file, with some optional transformations, depending on the following specified options:source
,target
andincludes
- ifincludes
is not present,source
is assumed to be the path to a single file. Ifincludes
is provided,source
is assumed to be a directory, and each member of theincludes
list a glob pattern inside thesource
directory.target
is always treated as a single file path.merge
- if provided, it must have two mandatory fields,iri
andversion
. In this case, all ontologies declared in the input files are removed, and a single new ontologies, specified by theiri
is created, usingversion
to buildowl:versionInfo
andowl:versionIRI
. Any imports on the removed ontologies which are not satisfied internally are transferred to the new ontology.definedBy
- has two possible values,strict
andall
. If provided, ardfs:isDefinedBy
is added to all non-blank node subjects in the exported RDF linking them to the ontology defined in the combined graph. If more that one ontology is defined, the export will fail. Ifstrict
is specified, only classes and properties will be annotated, whereasall
does not filter by type.retainDefinedBy
- by default,definedBy
will override any existingrdfs:definedBy
annotations, but if this option is provided, existing annotations will be left in place.format
- One ofturtle
,xml
, ornt
(N-Triples), specifies the output format for the export. The default output format isturtle
.context
- If provided, generates a N-Quads export with thecontext
argument as the name of the graph. When this option is present, the value offormat
is ignored.compress
- when this istrue
, the output isgzip
-ed.
transform
, which applies the specified tool to a set of input files, and supports the following arguments:tool
, which references thename
of a tool which must be defined in thetools
section.source
,target
andincludes
, which function just like they do for thecopy
andmove
actions, with each input and output path bound into theinputFile
andoutputFile
variables before the tool arguments are interpreted.replace
andrename
, which are applied after the tool invocation, and work as described above.
sparql
reads RDF files provided via thesource
andincludes
options and executes a SPARQL query on the resulting combined graph.- If the
query
option is a valid file path, the query is read from that file, otherwise the contents of thequery
option are interpreted as the query. SELECT
query results are stored in the file specified viatarget
as a CSV.- RDF results from a
CONSTRUCT
query are stored as either Turtle, RDF/XML or N-Triples, depending on theformat
option (turtle
,xml
, ornt
).
- If the
Utility Actions
markdown
transforms a.md
file referenced insource
into an HTML output specified intarget
.graph
reads RDF files provided via thesource
andincludes
options and generates a graphical representation of the ontology, as in thegraphic
sub-command described above. Both.dot
and.png
outputs are written to the directory specified in thetarget
option, andtitle
andversion
attributes configure the title on the generated graph. Ifcompact
is specified asTrue
, a concise graph including only ontology names and imports is generated.
Validation
The verify
action reads RDF files provided via the source
and includes
options and performs validation on the
resulting combined graph. If the validation fails, the bundle process exits with a non-zero status and
does not execute subsequent actions. The type of verification performed depends on the
value of the type
option:
- If
type
isselect
, one or more SPARQLSELECT
queries are executed against the graph, and the first query to return a non-empty result will terminate the bundle. The results of the query will be output to the log, and also written as CSV to a file path specified by thetarget
option, if provided. Queries can be specified in one of two ways (only one can be present):- If the
query
option is a valid file path, the query is read from that file, otherwise the contents of thequery
option are interpreted as the query, e.g.query: > prefix skos: <http://www.w3.org/2004/02/skos/core#> select ?unlabeled where {{ ?unlabeled a ?type . filter not exists {{ ?unlabeled skos:prefLabel ?label }} }}
- If
queries
is provided, a list of queries will be built from thesource
andincludes
sub-options. The queries will be executed in order specified. IfstopOnFail
is omitted or istrue
, the first query that produces a failing result will causeverify
to abort. IfstopOnFail
isfalse
, all queries will be executed regardless of failures, and the value oftarget
is treated as a directory where the results of each failing query will be written.- action: 'verify' type: 'select' source: '{input}' includes: - 'verify_data.ttl' target: '{output}/verify_select_results' stopOnFail: false queries: source: '{input}' includes: - 'verify_*_select_query.rq'
- If the
- If
type
isask
, one or more SPARQLASK
queries will be executed. Queries are specified similarly to theselect
validation. UnlessstopOnFail
is set tofalse
, the first query producing a result that does not match the requiredexpected
option, the bundle will terminate. For example:actions: - action: 'verify' type: 'ask' source: '{input}' includes: - 'verify_data.ttl' queries: source: '{input}' includes: - '*_ask_query.rq' expected: false
- If
type
isshacl
, a SHACL shape graph will be constructed from the file specified via theshapes
option (which must have asource
, and optionallyincludes
), with the bundle terminating only if anysh:Violation
results are present, unless thefailOn
option specifies otherwise.The report is emitted to the log, and saved as Turtle to the path specified in the
target` option if it's provided. For example:- action: 'verify' type: 'shacl' inference: 'rdfs' source: '{input}' includes: - 'verify_data.ttl' target: '{output}/verify_shacl_errors.ttl' failOn: "warning" shapes: source: '{input}/verify_shacl_shapes.ttl'
If theinference
option is provided, the reasoner will be run on the graph prior to applying the SHACL rules. The valid values are:rdfs
,owlrl
,both
, ornone
(default).
- If
type
isconstruct
, the queries are expected toCONSTRUCT
a SHACL ValidationReport. The validation will be considered as a failure if the resulting graph is non-empty.target
,stopOnFail
andquery
/queries
are handled same asselect
validation, andfailOn
is used to determine which violations will terminate execution.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for onto_tool-1.3.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b37ab9f957b567075bfe0bba9fadd7d3ae28505dd73173a0786480d03a4815f5 |
|
MD5 | 8e86e00012c61dedc0bce6b4f87171dc |
|
BLAKE2b-256 | 1aa9b2f4a2feeb3e20086d92b7feea0d10308cf386a959e70d893f5b2473c3e7 |