A Python package that provides a series of functions to work with Prez Manifests.
Project description
Prez Manifest
This repository contains the prezmanifest
Python package that provides a series of functions to work with Prez Manifests.
Contents
- What is a Prez Manifest?
- Functions
- Use
- Testing
- Extending
- License
- Contact
- Background concepts & other resources
- Case Studies
What is a Prez Manifest?
A Prez Manifest is an RDF file that describes and links to a set of resources that can be loaded into an RDF database for the Prez graph database publication system to provide access to. The Prez Manifest specification is online at: https://prez.dev/manifest/.
Functions
The functions provided are:
- validate
- performs SHACL validation on the Manifest, followed by existence checking for each resource - are they reachable by this script on the file system or over the Internet? Will also check any Conformance Claimsgiven in the Manifest)
- label
- lists all the IRIs for elements with a Manifest's Resources that don't have labels. Given a source of additional labels, such as the KurrawongAI Semantic Background, it can try to extract any missing labels and insert them into a Manifest as an additional labelling resource
- document
- table: can create a Markdown or ASCCIIDOC table of Resources from a Prez Manifest file for use in README files in repositories
- catalogue: add the IRIs of resources within a Manifest's 'Resource Data' object to a catalogue RDF file
- load
- extract all the content of all Resources listed in a Prez Manifest and load it into either a single RDF multi-graph ('quads') file or into an RDF DB instance by using the Graph Store Protocol
Installation
This Python package is intended to be used on the command line on Linux/UNIX-like systems and/or as a Python library, called directly from other Python code.
Library
It is available on PyPI at https://pypi.org/project/prezmanifest/ so can be installed using Poetry or PIP etc. We do recommend UV as the package manager we find easiest to work with.
Command Line
To make available the command line script pm
you need to first install UV
, see the uv installation instructions, then:
uv tool install prezmanifest
Now you can invoke pm
anywhere in your termina as long as ~/,local/bin/
is in your PATH
.
Latest
You can also always install the latest, unstable, release from its version control repository: https://github.com/Kurrawong/prez-manifest/, but we make prezmanifest releases often, so the latest shouldn't ever be too far ahead of the most recent release.
Use
[!TIP] See the Case Study: Establish below for a short description of the establishment of a new catalogue using prezmanifest.
Library
Install as above and then, in your Python code, import the functions you want to use. Currently, these are the public functions:
from prezmanifest.validator import validate
from prezmanifest.labeller import LabellerOutputTypes, label
from prezmanifest.documentor import table, catalogue
from prezmanifest.loader import load
Command Line
All the functions of the library are made available as a command line application called pm
. After installation, as above, you can inspect the command line tool by asking for "help" like this:
pm -h
Which will print something like this:
PrezManifest top-level Command Line Interface. Ask for help (-h) for each Command
╭─ Options ──────────────────────────────────────────────────────────────────────────────────────────────╮
│ --version -v │
│ --help -h Show this message and exit. │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Commands ─────────────────────────────────────────────────────────────────────────────────────────────╮
│ validate Validate the structure and content of a Prez Manifest │
│ sync Synchronize a Prez Manifest's resources with loaded copies of them in a SPARQL Endpoint │
│ label Discover labels missing from data in a in a Prez Manifest and patch them │
│ document Create documentation from a Prez Manifest │
│ load Load a Prez Manifest's content into a file or DB │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────╯
To find out more about each Command, ask for helo like this - for load:
pm load -h
[!TIP] See the Case Study: Sync below for a description of the different ways to sync
Testing
Run uv run pytest
, or Poetry etc. equivalents, to execute pytest. You must have Docker Desktop running to allow all loader tests to be executed as some use temporary test containers.
Extending
Many functions have been placed into prezmanifest/utils.py
and hopefully extensions can be made to individual functions there.
For example, to extend the criteria prezmanifest
uses to judge the newness of a local v. a remote artifacts for the sync
function, see the compare_version_indicators()
License
This code is available for reuse according to the https://opensource.org/license/bsd-3-clause[BSD 3-Clause License].
© 2024-2025 KurrawongAI
Contact
For all matters, please contact:
KurrawongAI
info@kurrawong.ai
https://kurrawong.ai
Background concepts & other resources
The amin documentation for Prez Manifests - what they are, how to make them etc., is online at https://prez.dev, however, here are also two concepts referred to above, summarised.
Conformance Claims
A claim that some data conforms to a standard or a profile. In Prez Manifest, this is about indicating that a Resource should and is expected to conform to a standard.
See the various Manifest files in tests/demo-vocabs/
for examples of them in use for individual resources or all resources, e.g. tests/demo-vocabs/manifest-conformance.ttl
KurrawongAI Semantic Background
KurrawongAI makes available labels for all the elements of about 100 well-known ontologies and vocabularies at https://demo.dev.kurrawong.ai/catalogs/exm:demo-vocabs. You can use this as a source (SPARQL Endpoint) of labels to patch content in Manifests that are missing labels with.
Case Studies
Case Study: Establish
The Indigenous Studies Unit Catalogue is a new catalogue of resources - books, articles, boxes of archived documents - produced by the Indigenous Studies Unit at the University of Melbourne.
The catalogue is available online via an instance of the Prez system at https://data.idnau.org and the content is managed in the GitHub repository https://github.com/idn-au/isu-catalogue.
The catalogue container object is constructed as a schema:DataCatalog
(and also a dcat:Catalog
, for compatibility
with legacy systems) containing multiple schema:CreativeWork
instances with subtyping to indicate 'book', 'artwork'
etc.
The source of the catalogue metadata is the static RDF file _background/catalogue-metadata.ttl
that was handwritten.
The source of the resources' information is the CSV file _background/datasets.csv
which was created by hand during a
visit to the Indigenous Studies Unit. This CSV information was converted to RDF files in resources/
using the custom
script _background/resources_make.py
.
After creation of the catalogue container object's metadata and the primary resource information, prezmanifest was used to improve the presentation of the data in Prez in the following ways:
- A manifest files was created
- based on the example in this repository in
tests/demo-vocabs/manifest.ttl
- the example was copy 'n pasted with only minor changes, see
manifest.ttl
in the ISU catalogue repo - the initial manifest file was validated with prezmanifest/validator:
pm validate isu-catalogue/manifest.ttl
- based on the example in this repository in
- A labels file was automatically generated using prezmanifest/labeller
- using the KurrawongAI Semantic Background as a source of labels
- using the command
pm label rdf isu-catalogue/manifest.ttl http://demo.dev.kurrawong.ai/sparql > labels.ttl
- the file,
labels.ttl
was stored in the ISU Catalogue repo_background/
folder and indicated in the manifest file with the role of Incomplete Catalogue And Resource Labels as it doesn't provide all missing labels- note that this storage could have been done automatically using the
pm label manifest
command
- note that this storage could have been done automatically using the
- IRIs still missing labels were determined
- using prezmanifest/labeller again with the command
pm label iris isu-catalogue/manifest.ttl > iris.txt
, all IRIs still missing labels were listed
- using prezmanifest/labeller again with the command
- Labels for remaining IRIs were manually created
- there were only 7 important IRIs (as opposed to system objects that don't need labels) that still needed labels. These where manually created in the file
_background/labels-manual.ttl
- the manual labels file was added to the catalogue's manifest, also with a role of Incomplete Catalogue And Resource Labels
- there were only 7 important IRIs (as opposed to system objects that don't need labels) that still needed labels. These where manually created in the file
- A final missing labels test was performed
- running
pm label iris isu-catalogue/manifest.ttl > iris.txt
again indicated no important IRIs were still missing labels
- running
- The catalogue was enhanced
pm document catalogue isu-catalogue/manifest.ttl
was run to add all the resources of the catalogue to thecatalogue.ttl
file
- The manifest was documented
- using prezmanifest/documentor, a Markdown table of the manifest's content was created using the command
pm document table isu-catalogue/manifest.ttl
- the output of this command - a Markdown table - is visible in the ISU Catalogue repo's README file.
- using prezmanifest/documentor, a Markdown table of the manifest's content was created using the command
- The catalogue was prepared for upload
pm load file isu-catalogue/manifest.ttl isu-catalogue.trig
was run- it produced a single trig file
isu-catalogue.trig
containing RDF graphs which can easily be uploaded to the database delivering the catalogue pm load sparql isu-catalogue/manifest.ttl http://a-sparql-endpoint.com/ds -u username -p password
could have been run to load the content directly into the ISU RDF DB, if it had been available
Case Study: Sync
If I have a manifest locally, I can load it into a remote SPARQL Endpoint like this:
pm load sparql {PATH-TO-MANIFEST} {SPARQL-ENDPOINT}
Going forward, I don't have to blow away all the content in the SPARQL Endpoint and reload everything whenever I have content changes, instead I can use the sync
command.
sync
compares "version indicators" per artifact, determines which is more recent and then reports on whether the local artifact should be uploaded, teh remote one downloaded or whether there are new artifacts present locally or remotely.
The tests/test_sync/
directory in this repository contains a local and a remote manifest and content. Following the logic in the testing function tests/test_sync/test_sync.py::test_sync
, if the remote manifest is loaded, as per pm load sparql tests/test_sync/remote/manifest.ttl {SPARQL-ENDPOINT}
and then sync
is run like this:
pm sync tests/test_sync/local/manifest.ttl {SPARQL-ENDPOINT}
You will see a report like this:
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Artifact ┃ Main Entity ┃ Direction ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ /../../../artifact4.ttl │ http://example.com/dataset/4 │ upload │
│ /../../../artifact5.ttl │ http://example.com/dataset/5 │ add-remotely │
│ /../../../artifact6.ttl │ http://example.com/dataset/6 │ download │
│ /../../../artifact7.ttl │ http://example.com/dataset/7 │ upload │
│ /../../../artifact9.ttl │ http://example.com/dataset/9 │ same │
│ /../../../artifacts/artifact1.ttl │ http://example.com/dataset/1 │ same │
│ /../../../artifacts/artifact2.ttl │ http://example.com/dataset/2 │ upload │
│ /../../../artifacts/artifact3.ttl │ http://example.com/dataset/3 │ upload │
│ /../../../catalogue.ttl │ https://example.com/sync-test │ same │
│ http://example.com/dataset/8 │ http://example.com/dataset/8 │ add-locally │
└───────────────────────────────────┴───────────────────────────────┴──────────────┘
This is telling you, per artifact, what sync
will do.
- the local copy of
artifact4.ttl
is newer than the remote one, so it wants to "upload" - the remote location is missing
artifact5.ttl
, so it wants to upload that too artifact9
is the "same" - no action requiredartifact6.ttl
is newer remotely, it should be downloaded
You can choose to have sync
carry out all these actions or only some - default is all - by setting the update_remote
and so on input parameters. Setting all to False
will cause sync
to do nothing and report only what it would do if they where not set, e.g.:
pm sync tests/test_sync/local/manifest.ttl http://localhost:3030/test/ False False False False
Other than doing all this "manually" - interactively, on the command line - I might want to use sync
in Python application code or cloud infracode scriptin.
For use in Python applications, just import prezmanifest - uv add prezmanifest
etc. - and use, as per the use of sync
in tests/test_sync/test_sync.py::test_sync
.
For use in infracode, note that the pm sync
function can return the table above in JSON by setting the response format
input parameter, -f
.