Skip to main content

A Python package that provides a series of functions to work with Prez Manifests.

Project description

Prez Manifest

This repository contains the prezmanifest Python package that provides a series of functions to work with Prez Manifests.

Contents

What is a Prez Manifest?

A Prez Manifest is an RDF file that describes and links to a set of resources that can be loaded into an RDF database for the Prez graph database publication system to provide access to. The Prez Manifest specification is online at: https://prez.dev/manifest/.

Functions

The functions provided are:

  • validate
    • performs SHACL validation on the Manifest, followed by existence checking for each resource - are they reachable by this script on the file system or over the Internet? Will also check any Conformance Claims given in the Manifest)
  • label
    • lists all the IRIs for elements with a Manifest's Resources that don't have labels. Given a source of additional labels, such as the KurrawongAI Semantic Background, it can try to extract any missing labels and insert them into a Manifest as an additional labelling resource
  • document
    • table: can create a Markdown or ASCCIIDOC table of Resources from a Prez Manifest file for use in README files in repositories
    • catalogue: add the IRIs of resources within a Manifest's 'Resource Data' object to a catalogue RDF file
  • load
    • extract all the content of all Resources listed in a Prez Manifest and load it into either a single RDF multi-graph ('quads') file or into an RDF DB instance by using the Graph Store Protocol
  • sync
    • synchronises some kinds of resources list in a Manifest with versions of them in a SPARQL Endpoint
    • acts as load if run against an empty SPARQL Endpoint
    • does not yet load background resources

Installation

This Python package is intended to be used as a Python library, called directly from other Python code, or on the command line on Linux/UNIX-like systems.

Library

It is available on PyPI at https://pypi.org/project/prezmanifest/ so can be installed using Poetry or PIP etc. We do recommend UV as that's the package manager we find easiest to work with.

Command Line

To make available the command line script pm you need to first install UV, see the uv installation instructions, then:

uv tool install prezmanifest

Now you can invoke pm anywhere in your terminal as long as /local/bin/ is in your PATH.

Latest

You can also always install the latest, unstable, release from its version control repository: https://github.com/Kurrawong/prez-manifest/, but we make prezmanifest releases often, so the latest shouldn't ever be too far ahead of the most recent release.

Use

[!TIP] See the Case Study: Establish below for a short description of the establishment of a new catalogue using prezmanifest.

Library

Install as above and then, in your Python code, import the functions you want to use. Currently, these are the public functions:

from prezmanifest.validator import validate
from prezmanifest.labeller import LabellerOutputTypes, label
from prezmanifest.documentor import table, catalogue
from prezmanifest.loader import load
from prezmanifest.syncer import sync

Command Line

All the functions of the library are made available as a command line application called pm. After installation, as above, you can inspect the command line tool by asking for "help" like this:

pm -h

Which will print something like this:

PrezManifest top-level Command Line Interface. Ask for help (-h) for each Command                        
                                                                                                          
╭─ Options ──────────────────────────────────────────────────────────────────────────────────────────────╮
│ --version  -v                                                                                          │
│ --help     -h        Show this message and exit.                                                       │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Commands ─────────────────────────────────────────────────────────────────────────────────────────────╮
│ validate   Validate the structure and content of a Prez Manifest                                       │
│ sync       Synchronize a Prez Manifest's resources with loaded copies of them in a SPARQL Endpoint     │
│ label      Discover labels missing from data in a in a Prez Manifest and patch them                    │
│ document   Create documentation from a Prez Manifest                                                   │
│ load       Load a Prez Manifest's content into a file or DB                                            │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────╯

To find out more about each Command, ask for helo like this - for load:

pm load -h

Logging

You can control the verbosity of the command line tool by setting the PM_LOG_LEVEL environment variable to one of Python's standard logging levels: DEBUG, INFO, WARNING, ERROR, or CRITICAL. The default level is WARNING.

For example, to see detailed debug output:

PM_LOG_LEVEL=DEBUG pm load file my-manifest.ttl output.trig

Or for informational messages:

PM_LOG_LEVEL=INFO pm validate my-manifest.ttl

[!TIP] See the Case Study: Sync below for a description of the different ways to sync

Testing

Run uv run pytest, or Poetry etc. equivalents, to execute pytest. You must have Docker Desktop running to allow all loader tests to be executed as some use temporary test containers.

Extending

Many functions have been placed into prezmanifest/utils.py and hopefully extensions can be made to individual functions there.

For example, to extend the criteria prezmanifest uses to judge the newness of a local v. a remote artifacts for the sync function, see the compare_version_indicators()

License

This code is available for reuse according to the BSD 3-Clause License.

© 2024-2025 KurrawongAI

Contact

For all matters, please contact:

KurrawongAI
info@kurrawong.ai
https://kurrawong.ai

Background concepts & other resources

The admin documentation for Prez Manifests - what they are, how to make them etc., is online at https://prez.dev, however, here are also two concepts referred to above, summarised.

Conformance Claims

A claim that some data conforms to a standard or a profile. In Prez Manifest, this is about indicating that a Resource should and is expected to conform to a standard.

See the various Manifest files in tests/demo-vocabs/ for examples of them in use for individual resources or all resources, e.g. tests/demo-vocabs/manifest-conformance.ttl

KurrawongAI Semantic Background

KurrawongAI makes available labels for all the elements of about 100 well-known ontologies and vocabularies at KurrawongAI Semantic Background. You can use this as a source (SPARQL Endpoint) of labels to patch content in Manifests that are missing labels with.

Case Studies

Case Study: Establish

The Indigenous Studies Unit Catalogue is a new catalogue of resources - books, articles, boxes of archived documents - produced by the Indigenous Studies Unit at the University of Melbourne.

The catalogue is available online via an instance of the Prez system at https://data.idnau.org and the content is managed in the GitHub repository https://github.com/idn-au/isu-catalogue.

The catalogue container object is constructed as a schema:DataCatalog (and also a dcat:Catalog, for compatibility with legacy systems) containing multiple schema:CreativeWork instances with subtyping to indicate 'book', 'artwork' etc.

The source of the catalogue metadata is the static RDF file _background/catalogue-metadata.ttl that was handwritten.

The source of the resources' information is the CSV file _background/datasets.csv which was created by hand during a visit to the Indigenous Studies Unit. This CSV information was converted to RDF files in resources/ using the custom script _background/resources_make.py.

After creation of the catalogue container object's metadata and the primary resource information, prezmanifest was used to improve the presentation of the data in Prez in the following ways:

  1. A manifest files was created
    • based on the example in this repository in tests/demo-vocabs/manifest.ttl
    • the example was copy 'n pasted with only minor changes, see manifest.ttl in the ISU catalogue repo
    • the initial manifest file was validated with prezmanifest/validator: pm validate isu-catalogue/manifest.ttl
  2. A labels file was automatically generated using prezmanifest/labeller
    • using the KurrawongAI Semantic Background as a source of labels
    • using the command pm label rdf isu-catalogue/manifest.ttl http://demo.dev.kurrawong.ai/sparql > labels.ttl
    • the file, labels.ttl was stored in the ISU Catalogue repo _background/ folder and indicated in the manifest file with the role of Incomplete Catalogue And Resource Labels as it doesn't provide all missing labels
      • note that this storage could have been done automatically using the pm label manifest command
  3. IRIs still missing labels were determined
    • using prezmanifest/labeller again with the command pm label iris isu-catalogue/manifest.ttl > iris.txt, all IRIs still missing labels were listed
  4. Labels for remaining IRIs were manually created
    • there were only 7 important IRIs (as opposed to system objects that don't need labels) that still needed labels. These where manually created in the file _background/labels-manual.ttl
    • the manual labels file was added to the catalogue's manifest, also with a role of Incomplete Catalogue And Resource Labels
  5. A final missing labels test was performed
    • running pm label iris isu-catalogue/manifest.ttl > iris.txt again indicated no important IRIs were still missing labels
  6. The catalogue was enhanced
    • pm document catalogue isu-catalogue/manifest.ttl was run to add all the resources of the catalogue to the catalogue.ttl file
  7. The manifest was documented
    • using prezmanifest/documentor, a Markdown table of the manifest's content was created using the command pm document table isu-catalogue/manifest.ttl
    • the output of this command - a Markdown table - is visible in the ISU Catalogue repo's README file.
  8. The catalogue was prepared for upload
    • pm load file isu-catalogue/manifest.ttl isu-catalogue.trig was run
    • it produced a single trig file isu-catalogue.trig containing RDF graphs which can easily be uploaded to the database delivering the catalogue
    • pm load sparql isu-catalogue/manifest.ttl http://a-sparql-endpoint.com/ds -u username -p password could have been run to load the content directly into the ISU RDF DB, if it had been available

Case Study: Sync

If I have a manifest locally, I can load it into a remote SPARQL Endpoint like this:

pm load sparql {PATH-TO-MANIFEST} {SPARQL-ENDPOINT}

Going forward, I don't have to blow away all the content in the SPARQL Endpoint and reload everything whenever I have content changes, instead I can use the sync command.

sync compares "version indicators" per artifact, determines which is more recent and then reports on whether the local artifact should be uploaded, teh remote one downloaded or whether there are new artifacts present locally or remotely.

The tests/test_sync/ directory in this repository contains a local and a remote manifest and content. Following the logic in the testing function tests/test_sync/test_sync.py::test_sync, if the remote manifest is loaded, as per pm load sparql tests/test_sync/remote/manifest.ttl {SPARQL-ENDPOINT} and then sync is run like this:

pm sync tests/test_sync/local/manifest.ttl {SPARQL-ENDPOINT}

You will see a report like this:

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Artifact                          ┃ Main Entity                   ┃ Direction    ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ /../../../artifact4.ttl           │ http://example.com/dataset/4  │ upload       │
│ /../../../artifact5.ttl           │ http://example.com/dataset/5  │ add-remotely │
│ /../../../artifact6.ttl           │ http://example.com/dataset/6  │ download     │
│ /../../../artifact7.ttl           │ http://example.com/dataset/7  │ upload       │
│ /../../../artifact9.ttl           │ http://example.com/dataset/9  │ same         │
│ /../../../artifacts/artifact1.ttl │ http://example.com/dataset/1  │ same         │
│ /../../../artifacts/artifact2.ttl │ http://example.com/dataset/2  │ upload       │
│ /../../../artifacts/artifact3.ttl │ http://example.com/dataset/3  │ upload       │
│ /../../../catalogue.ttl           │ https://example.com/sync-test │ same         │
│ http://example.com/dataset/8      │ http://example.com/dataset/8  │ add-locally  │
└───────────────────────────────────┴───────────────────────────────┴──────────────┘

This is telling you, per artifact, what sync will do.

  • the local copy of artifact4.ttl is newer than the remote one, so it wants to "upload"
  • the remote location is missing artifact5.ttl, so it wants to upload that too
  • artifact9 is the "same" - no action required
  • artifact6.ttl is newer remotely, it should be downloaded

You can choose to have sync carry out all these actions or only some - default is all - by setting the update_remote and so on input parameters. Setting all to False will cause sync to do nothing and report only what it would do if they were not set, e.g.:

pm sync tests/test_sync/local/manifest.ttl http://localhost:3030/test/ False False False False

Other than doing all this "manually" - interactively, on the command line - I might want to use sync in Python application code or cloud infracode scriptin.

For use in Python applications, just import prezmanifest - uv add prezmanifest etc. - and use, as per the use of sync in tests/test_sync/test_sync.py::test_sync.

For use in infracode, note that the pm sync function can return the table above in JSON by setting the response format input parameter, -f.

Release Procedure

  • format code: task format
  • pass tests: task test
  • update version in pyproject.toml
  • commit all updates: git commit -a "..."
  • make GitHub release
    • this will trigger pypi.yml workflow to publish to PyPI

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

prezmanifest-1.2.1.tar.gz (108.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

prezmanifest-1.2.1-py3-none-any.whl (38.5 kB view details)

Uploaded Python 3

File details

Details for the file prezmanifest-1.2.1.tar.gz.

File metadata

  • Download URL: prezmanifest-1.2.1.tar.gz
  • Upload date:
  • Size: 108.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.1

File hashes

Hashes for prezmanifest-1.2.1.tar.gz
Algorithm Hash digest
SHA256 b5c98965f1951494745a196819ef2ed362bf16d4cae5d3faecfec05951ad997c
MD5 2cb0b1c103b721476a915898550cac4c
BLAKE2b-256 94ad7155407292fee2b41c673c2fde1efe1c8b5bb3cd633b27de4cbc75ba405b

See more details on using hashes here.

File details

Details for the file prezmanifest-1.2.1-py3-none-any.whl.

File metadata

File hashes

Hashes for prezmanifest-1.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2ef93aefee1a43de379356abdcb59e1894a3e638114bb64f5dbf045a0ec28254
MD5 867eb2a801471ffdb71d8713fb32b836
BLAKE2b-256 af1cea39af407ea57cdc7953fa7e48a91c08ed1defc18c64f2d2904c7afec977

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page