Skip to main content

SKOS vocabulary management tool by NFDI4Cat.

Project description

DOI PyPI - Version

SKOS vocabulary management with GitHub & Excel

Overview

For voc4cat, a term collection for catalysis created in NFDI4Cat, we developed a toolbox for collaboratively maintaining SKOS vocabularies on GitHub using Excel (xlsx-files) as user-friendly interface. It consists of several parts:

  • voc4cat-tool (this package)
    • A command-line tool to convert vocabularies from Excel/xlsx to SKOS (turtle/rdf) and validate the vocabulary. Validation includes formal SHACL profile checks as well as additional custom validation. The voc4cat tool can be run locally but is also well suited for integration in CI-pipelines. It was inspired by RDFLib/VocExcel. Parts of the VocExcel codebase were merged into this repository (see git history).
  • voc4cat-template
    • A GitHub project template for managing SKOS-vocabularies using a GitHub-based workflows including automation by gh-actions.
  • voc4cat
    • A SKOS vocabulary for the catalysis disciplines that uses the voc4cat workflow for real work.

Command-line tool voc4cat

voc4cat was mainly developed to be used in gh-actions but it is also useful as a locally installed command line tool. It has the following features.

  • Convert between SKOS-vocabularies in Excel/xlsx format and rdf-format (turtle) in both directions.
  • Check/validate SKOS-vocabularies in rdf/turtle format with the vocpub SHACL-profile.
  • Manage vocabulary metadata (title, description, creator, publisher, etc.) via configuration file.
  • Extract provenance information from git history (created, updated).
  • Allocate ID ranges to contributors and track their contributions.
  • Check xlsx vocabulary files for errors or incorrect use of IDs (voc4cat uses pydantic for this validation).
  • Generate documentation from SKOS/turtle vocabulary file using pyLODE.

The RDF produced by 1.x version of voc4cat-tool is well aligned with the data model of Skosmos 3 (even deprecations are correctly visualized). We recommend Skosmos for providing an API to your vocabulary or if you want as a richer user interface than the static HTML documentation.

Installation

voc4cat is platform independent and works on Windows, Linux, and macOS. It requires Python (3.10 or newer).

If you only want to use the command-line interface, it is strongly suggested to install with uv tool or pipx. Both simplify installing and managing Python command-line applications.

uv tool install voc4cat

or

pipx install voc4cat

To validate the successful installation, run

voc4cat --version

The available commands and options can be explored via the help system:

voc4cat --help

You can optionally install the "assistant" which uses sentence-transformers for concept similarity analysis. This adds over 100 MB to the download so we don't include it in the default installer. To include it modify the command (for uv tool) to

uv tool install "voc4cat[assistant]"

Alternatively, you can install voc4cat using pip like any other Python package.

To install including all development tools use pip install .[dev].

Getting started

See the Documentation for detailed guidance.

To create a new vocabulary, first set up a configuration file idranges.toml for your vocabulary (see example). This file defines vocabulary metadata and ID ranges for contributors. Then create an xlsx-template:

voc4cat template --config myvocab/idranges.toml --outdir myvocab/

This creates myvocab.xlsx (named after your vocabulary) with the structure for entering concepts.

Convert the vocabulary file from xlsx to SKOS/turtle format:

voc4cat convert --config myvocab/idranges.toml myvocab/myvocab.xlsx

A turtle file myvocab.ttl is created in the same directory.

The reverse is also possible. Create an xlsx file from a turtle vocabulary:

voc4cat convert --config myvocab/idranges.toml --outdir myvocab/ myvocab/myvocab.ttl

Migrating from older versions

Vocabularies created with voc4cat-tool v0.10.x or earlier (format "043") can be converted to the v1.0 format. See migrating to v1.0 for details.

Feedback and code contributions

We highly appreciate your feedback. Please create an issue on GitHub.

Before you contribute code, we suggest to first create an issue to get early feedback on your ideas before you spend too much time.

By contributing, you agree that your contributions are licensed under the project's BSD-3-Clause license.

Contributors

For details, see the Zenodo record.

A big thanks to our GitHub contributors:

Voc4Cat-tool contributors

Figure made with contrib.rocks.

Acknowledgement

This work was funded by the German Research Foundation (DFG) through the project "NFDI4Cat - NFDI for Catalysis-Related Sciences" (DFG project no. 441926934), within the National Research Data Infrastructure (NFDI) programme of the Joint Science Conference (GWK).

This project includes the vocpub SHACL profile, which is licensed under the Creative Commons Attribution 4.0 International License (CC-BY 4.0) and was created by Nicholas J. Car. A copy of the license can be found at: https://creativecommons.org/licenses/by/4.0/.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

voc4cat-1.0.3.tar.gz (120.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

voc4cat-1.0.3-py3-none-any.whl (150.3 kB view details)

Uploaded Python 3

File details

Details for the file voc4cat-1.0.3.tar.gz.

File metadata

  • Download URL: voc4cat-1.0.3.tar.gz
  • Upload date:
  • Size: 120.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for voc4cat-1.0.3.tar.gz
Algorithm Hash digest
SHA256 c2d9e432429953750056c301360e5d5148b0e913f49fdf2e264c163a8ada0aa6
MD5 b46a640107250ea3b5f2abf759095ee4
BLAKE2b-256 5d94f4124b99a8f98af380a11068f1fbda19c68e4fba10f3e7a2ce159d834d60

See more details on using hashes here.

Provenance

The following attestation bundles were made for voc4cat-1.0.3.tar.gz:

Publisher: pypi-publish.yml on nfdi4cat/voc4cat-tool

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file voc4cat-1.0.3-py3-none-any.whl.

File metadata

  • Download URL: voc4cat-1.0.3-py3-none-any.whl
  • Upload date:
  • Size: 150.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for voc4cat-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 c29a7b8d94ba309a92086c567c1d09a5126245c8e29d7536a4b7f36f65cbdc6c
MD5 b196daf9a337c2cb6f8ffa50f530b294
BLAKE2b-256 3da3bc14c66e768bbf0c186f2f81240c1cc59994c154a607026df9ba22d77221

See more details on using hashes here.

Provenance

The following attestation bundles were made for voc4cat-1.0.3-py3-none-any.whl:

Publisher: pypi-publish.yml on nfdi4cat/voc4cat-tool

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page