A Python package that provides a series of functions to work with Prez Manifests.
Project description
Prez Manifest
This repository contains the prezmanifest Python package that provides a series of functions to work with Prez
Manifests.
Contents
- What is a Prez Manifest?
- Functions
- Use
- Testing
- Extending
- License
- Contact
- Background concepts & other resources
- Case Studies
What is a Prez Manifest?
A Prez Manifest is an RDF file that describes and links to a set of resources that can be loaded into an RDF database for the Prez graph database publication system to provide access to. The Prez Manifest specification is online at: https://prez.dev/manifest/.
Functions
The functions provided are:
- validate
- performs SHACL validation on the Manifest, followed by existence checking for each resource - are they reachable by this script on the file system or over the Internet? Will also check any Conformance Claims given in the Manifest)
- label
- lists all the IRIs for elements with a Manifest's Resources that don't have labels. Given a source of additional labels, such as the KurrawongAI Semantic Background, it can try to extract any missing labels and insert them into a Manifest as an additional labelling resource
- document
- table: can create a Markdown or ASCCIIDOC table of Resources from a Prez Manifest file for use in README files in repositories
- catalogue: add the IRIs of resources within a Manifest's 'Resource Data' object to a catalogue RDF file
- load
- extract all the content of all Resources listed in a Prez Manifest and load it into either a single RDF multi-graph ('quads') file or into an RDF DB instance by using the Graph Store Protocol
- sync
- synchronises some kinds of resources list in a Manifest with versions of them in a SPARQL Endpoint
- acts as
loadif run against an empty SPARQL Endpoint - does not yet load background resources
Installation
This Python package is intended to be used as a Python library, called directly from other Python code, or on the command line on Linux/UNIX-like systems.
Library
It is available on PyPI at https://pypi.org/project/prezmanifest/ so can be installed using Poetry or PIP etc. We do recommend UV as that's the package manager we find easiest to work with.
Command Line
To make available the command line script pm you need to first install UV, see
the uv installation instructions, then:
uv tool install prezmanifest
Now you can invoke pm anywhere in your terminal as long as /local/bin/ is in your PATH.
Latest
You can also always install the latest, unstable, release from its version control repository: https://github.com/Kurrawong/prez-manifest/, but we make prezmanifest releases often, so the latest shouldn't ever be too far ahead of the most recent release.
Use
[!TIP] See the Case Study: Establish below for a short description of the establishment of a new catalogue using prezmanifest.
Library
Install as above and then, in your Python code, import the functions you want to use. Currently, these are the public functions:
from prezmanifest.validator import validate
from prezmanifest.labeller import LabellerOutputTypes, label
from prezmanifest.documentor import table, catalogue
from prezmanifest.loader import load
from prezmanifest.syncer import sync
Command Line
All the functions of the library are made available as a command line application called pm. After installation, as
above, you can inspect the command line tool by asking for "help" like this:
pm -h
Which will print something like this:
PrezManifest top-level Command Line Interface. Ask for help (-h) for each Command
╭─ Options ──────────────────────────────────────────────────────────────────────────────────────────────╮
│ --version -v │
│ --help -h Show this message and exit. │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Commands ─────────────────────────────────────────────────────────────────────────────────────────────╮
│ validate Validate the structure and content of a Prez Manifest │
│ sync Synchronize a Prez Manifest's resources with loaded copies of them in a SPARQL Endpoint │
│ label Discover labels missing from data in a in a Prez Manifest and patch them │
│ document Create documentation from a Prez Manifest │
│ load Load a Prez Manifest's content into a file or DB │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────╯
To find out more about each Command, ask for helo like this - for load:
pm load -h
Logging
You can control the verbosity of the command line tool by setting the PM_LOG_LEVEL environment variable to one of
Python's standard logging levels: DEBUG, INFO, WARNING, ERROR, or CRITICAL. The default level is WARNING.
For example, to see detailed debug output:
PM_LOG_LEVEL=DEBUG pm load file my-manifest.ttl output.trig
Or for informational messages:
PM_LOG_LEVEL=INFO pm validate my-manifest.ttl
[!TIP] See the Case Study: Sync below for a description of the different ways to sync
Testing
Run uv run pytest, or Poetry etc. equivalents, to execute pytest. You must have Docker Desktop running to allow all
loader tests to be executed as some use temporary test containers.
Extending
Many functions have been placed into prezmanifest/utils.py and hopefully extensions can be made to individual
functions there.
For example, to extend the criteria prezmanifest uses to judge the newness of a local v. a remote artifacts for the
sync function, see the compare_version_indicators()
License
This code is available for reuse according to the BSD 3-Clause License.
© 2024-2025 KurrawongAI
Contact
For all matters, please contact:
KurrawongAI
info@kurrawong.ai
https://kurrawong.ai
Background concepts & other resources
The admin documentation for Prez Manifests - what they are, how to make them etc., is online at https://prez.dev, however, here are also two concepts referred to above, summarised.
Conformance Claims
A claim that some data conforms to a standard or a profile. In Prez Manifest, this is about indicating that a Resource should and is expected to conform to a standard.
See the various Manifest files in tests/demo-vocabs/ for examples of them in use for individual resources or all
resources, e.g. tests/demo-vocabs/manifest-conformance.ttl
KurrawongAI Semantic Background
KurrawongAI makes available labels for all the elements of about 100 well-known ontologies and vocabularies at KurrawongAI Semantic Background. You can use this as a source (SPARQL Endpoint) of labels to patch content in Manifests that are missing labels with.
Case Studies
Case Study: Establish
The Indigenous Studies Unit Catalogue is a new catalogue of resources - books, articles, boxes of archived documents - produced by the Indigenous Studies Unit at the University of Melbourne.
The catalogue is available online via an instance of the Prez system at https://data.idnau.org and the content is managed in the GitHub repository https://github.com/idn-au/isu-catalogue.
The catalogue container object is constructed as a schema:DataCatalog (and also a dcat:Catalog, for compatibility
with legacy systems) containing multiple schema:CreativeWork instances with subtyping to indicate 'book', 'artwork'
etc.
The source of the catalogue metadata is the static RDF file _background/catalogue-metadata.ttl that was handwritten.
The source of the resources' information is the CSV file _background/datasets.csv which was created by hand during a
visit to the Indigenous Studies Unit. This CSV information was converted to RDF files in resources/ using the custom
script _background/resources_make.py.
After creation of the catalogue container object's metadata and the primary resource information, prezmanifest was used to improve the presentation of the data in Prez in the following ways:
- A manifest files was created
- based on the example in this repository in
tests/demo-vocabs/manifest.ttl - the example was copy 'n pasted with only minor changes, see
manifest.ttlin the ISU catalogue repo - the initial manifest file was validated with prezmanifest/validator:
pm validate isu-catalogue/manifest.ttl
- based on the example in this repository in
- A labels file was automatically generated using prezmanifest/labeller
- using the KurrawongAI Semantic Background as a source of labels
- using the command
pm label rdf isu-catalogue/manifest.ttl http://demo.dev.kurrawong.ai/sparql > labels.ttl - the file,
labels.ttlwas stored in the ISU Catalogue repo_background/folder and indicated in the manifest file with the role of Incomplete Catalogue And Resource Labels as it doesn't provide all missing labels- note that this storage could have been done automatically using the
pm label manifestcommand
- note that this storage could have been done automatically using the
- IRIs still missing labels were determined
- using prezmanifest/labeller again with the command
pm label iris isu-catalogue/manifest.ttl > iris.txt, all IRIs still missing labels were listed
- using prezmanifest/labeller again with the command
- Labels for remaining IRIs were manually created
- there were only 7 important IRIs (as opposed to system objects that don't need labels) that still needed labels.
These where manually created in the file
_background/labels-manual.ttl - the manual labels file was added to the catalogue's manifest, also with a role of Incomplete Catalogue And Resource Labels
- there were only 7 important IRIs (as opposed to system objects that don't need labels) that still needed labels.
These where manually created in the file
- A final missing labels test was performed
- running
pm label iris isu-catalogue/manifest.ttl > iris.txtagain indicated no important IRIs were still missing labels
- running
- The catalogue was enhanced
pm document catalogue isu-catalogue/manifest.ttlwas run to add all the resources of the catalogue to thecatalogue.ttlfile
- The manifest was documented
- using prezmanifest/documentor, a Markdown table of the manifest's content was created using the command
pm document table isu-catalogue/manifest.ttl - the output of this command - a Markdown table - is visible in the ISU Catalogue repo's README file.
- using prezmanifest/documentor, a Markdown table of the manifest's content was created using the command
- The catalogue was prepared for upload
pm load file isu-catalogue/manifest.ttl isu-catalogue.trigwas run- it produced a single trig file
isu-catalogue.trigcontaining RDF graphs which can easily be uploaded to the database delivering the catalogue pm load sparql isu-catalogue/manifest.ttl http://a-sparql-endpoint.com/ds -u username -p passwordcould have been run to load the content directly into the ISU RDF DB, if it had been available
Case Study: Sync
If I have a manifest locally, I can load it into a remote SPARQL Endpoint like this:
pm load sparql {PATH-TO-MANIFEST} {SPARQL-ENDPOINT}
Going forward, I don't have to blow away all the content in the SPARQL Endpoint and reload everything whenever I have
content changes, instead I can use the sync command.
sync compares "version indicators" per artifact, determines which is more recent and then reports on whether the local
artifact should be uploaded, teh remote one downloaded or whether there are new artifacts present locally or remotely.
The tests/test_sync/ directory in this repository contains a local and a remote manifest and content. Following
the logic in the testing function tests/test_sync/test_sync.py::test_sync, if the remote manifest is loaded, as per
pm load sparql tests/test_sync/remote/manifest.ttl {SPARQL-ENDPOINT} and then sync is run like this:
pm sync tests/test_sync/local/manifest.ttl {SPARQL-ENDPOINT}
You will see a report like this:
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Artifact ┃ Main Entity ┃ Direction ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ /../../../artifact4.ttl │ http://example.com/dataset/4 │ upload │
│ /../../../artifact5.ttl │ http://example.com/dataset/5 │ add-remotely │
│ /../../../artifact6.ttl │ http://example.com/dataset/6 │ download │
│ /../../../artifact7.ttl │ http://example.com/dataset/7 │ upload │
│ /../../../artifact9.ttl │ http://example.com/dataset/9 │ same │
│ /../../../artifacts/artifact1.ttl │ http://example.com/dataset/1 │ same │
│ /../../../artifacts/artifact2.ttl │ http://example.com/dataset/2 │ upload │
│ /../../../artifacts/artifact3.ttl │ http://example.com/dataset/3 │ upload │
│ /../../../catalogue.ttl │ https://example.com/sync-test │ same │
│ http://example.com/dataset/8 │ http://example.com/dataset/8 │ add-locally │
└───────────────────────────────────┴───────────────────────────────┴──────────────┘
This is telling you, per artifact, what sync will do.
- the local copy of
artifact4.ttlis newer than the remote one, so it wants to "upload" - the remote location is missing
artifact5.ttl, so it wants to upload that too artifact9is the "same" - no action requiredartifact6.ttlis newer remotely, it should be downloaded
You can choose to have sync carry out all these actions or only some - default is all - by setting the update_remote
and so on input parameters. Setting all to False will cause sync to do nothing and report only what it would do if
they were not set, e.g.:
pm sync tests/test_sync/local/manifest.ttl http://localhost:3030/test/ False False False False
Other than doing all this "manually" - interactively, on the command line - I might want to use sync in Python
application code or cloud infracode scriptin.
For use in Python applications, just import prezmanifest - uv add prezmanifest etc. - and use, as per the use of
sync in tests/test_sync/test_sync.py::test_sync.
For use in infracode, note that the pm sync function can return the table above in JSON by setting the
response format input parameter, -f.
Release Procedure
- format code:
task format - pass tests:
task test - update version in pyproject.toml
- commit all updates:
git commit -a "..." - make GitHub release
- this will trigger pypi.yml workflow to publish to PyPI
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file prezmanifest-1.2.1.tar.gz.
File metadata
- Download URL: prezmanifest-1.2.1.tar.gz
- Upload date:
- Size: 108.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b5c98965f1951494745a196819ef2ed362bf16d4cae5d3faecfec05951ad997c
|
|
| MD5 |
2cb0b1c103b721476a915898550cac4c
|
|
| BLAKE2b-256 |
94ad7155407292fee2b41c673c2fde1efe1c8b5bb3cd633b27de4cbc75ba405b
|
File details
Details for the file prezmanifest-1.2.1-py3-none-any.whl.
File metadata
- Download URL: prezmanifest-1.2.1-py3-none-any.whl
- Upload date:
- Size: 38.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2ef93aefee1a43de379356abdcb59e1894a3e638114bb64f5dbf045a0ec28254
|
|
| MD5 |
867eb2a801471ffdb71d8713fb32b836
|
|
| BLAKE2b-256 |
af1cea39af407ea57cdc7953fa7e48a91c08ed1defc18c64f2d2904c7afec977
|