SKOS vocabulary management tool by NFDI4Cat.
Project description
SKOS vocabulary management with GitHub & Excel
Overview
For voc4cat, a term collection for catalysis created in NFDI4Cat, we developed a toolbox for collaboratively maintaining SKOS vocabularies on GitHub using Excel (xlsx-files) as user-friendly interface. It consists of several parts:
- voc4cat-tool (this package)
- A command-line tool to convert vocabularies from Excel/xlsx to SKOS (turtle/rdf) and validate the vocabulary. Validation includes formal SHACL profile checks as well as additional custom validation. The
voc4cattool can be run locally but is also well suited for integration in CI-pipelines. It was inspired by RDFLib/VocExcel. Parts of the VocExcel codebase were merged into this repository (see git history).
- A command-line tool to convert vocabularies from Excel/xlsx to SKOS (turtle/rdf) and validate the vocabulary. Validation includes formal SHACL profile checks as well as additional custom validation. The
- voc4cat-template
- A GitHub project template for managing SKOS-vocabularies using a GitHub-based workflows including automation by gh-actions.
- voc4cat
- A SKOS vocabulary for the catalysis disciplines that uses the voc4cat workflow for real work.
Command-line tool voc4cat
voc4cat was mainly developed to be used in gh-actions but it is also useful as a locally installed command line tool. It has the following features.
- Convert between SKOS-vocabularies in Excel/xlsx format and rdf-format (turtle) in both directions.
- Check/validate SKOS-vocabularies in rdf/turtle format with the vocpub SHACL-profile.
- Manage vocabulary metadata (title, description, creator, publisher, etc.) via configuration file.
- Extract provenance information from git history (created, updated).
- Allocate ID ranges to contributors and track their contributions.
- Check xlsx vocabulary files for errors or incorrect use of IDs (voc4cat uses pydantic for this validation).
- Generate documentation from SKOS/turtle vocabulary file using pyLODE.
The RDF produced by 1.x version of voc4cat-tool is well aligned with the data model of Skosmos 3 (even deprecations are correctly visualized). We recommend Skosmos for providing an API to your vocabulary or if you want as a richer user interface than the static HTML documentation.
Installation
voc4cat is platform independent and works on Windows, Linux, and macOS. It requires Python (3.10 or newer).
If you only want to use the command-line interface, it is strongly suggested to install with uv tool or pipx. Both simplify installing and managing Python command-line applications.
uv tool install voc4cat
or
pipx install voc4cat
To validate the successful installation, run
voc4cat --version
The available commands and options can be explored via the help system:
voc4cat --help
You can optionally install the "assistant" which uses sentence-transformers for concept similarity analysis. This adds over 100 MB to the download so we don't include it in the default installer. To include it modify the command (for uv tool) to
uv tool install "voc4cat[assistant]"
Alternatively, you can install voc4cat using pip like any other Python package.
To install including all development tools use pip install .[dev].
Getting started
See the Documentation for detailed guidance.
To create a new vocabulary, first set up a configuration file idranges.toml for your vocabulary (see example).
This file defines vocabulary metadata and ID ranges for contributors.
Then create an xlsx-template:
voc4cat template --config myvocab/idranges.toml --outdir myvocab/
This creates myvocab.xlsx (named after your vocabulary) with the structure for entering concepts.
Convert the vocabulary file from xlsx to SKOS/turtle format:
voc4cat convert --config myvocab/idranges.toml myvocab/myvocab.xlsx
A turtle file myvocab.ttl is created in the same directory.
The reverse is also possible. Create an xlsx file from a turtle vocabulary:
voc4cat convert --config myvocab/idranges.toml --outdir myvocab/ myvocab/myvocab.ttl
Migrating from older versions
Vocabularies created with voc4cat-tool v0.10.x or earlier (format "043") can be converted to the v1.0 format. See migrating to v1.0 for details.
Feedback and code contributions
We highly appreciate your feedback. Please create an issue on GitHub.
Before you contribute code, we suggest to first create an issue to get early feedback on your ideas before you spend too much time.
By contributing, you agree that your contributions are licensed under the project's BSD-3-Clause license.
Contributors
For details, see the Zenodo record.
A big thanks to our GitHub contributors:
Figure made with contrib.rocks.
Acknowledgement
This work was funded by the German Research Foundation (DFG) through the project "NFDI4Cat - NFDI for Catalysis-Related Sciences" (DFG project no. 441926934), within the National Research Data Infrastructure (NFDI) programme of the Joint Science Conference (GWK).
This project includes the vocpub SHACL profile, which is licensed under the Creative Commons Attribution 4.0 International License (CC-BY 4.0) and was created by Nicholas J. Car. A copy of the license can be found at: https://creativecommons.org/licenses/by/4.0/.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file voc4cat-1.0.3.tar.gz.
File metadata
- Download URL: voc4cat-1.0.3.tar.gz
- Upload date:
- Size: 120.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c2d9e432429953750056c301360e5d5148b0e913f49fdf2e264c163a8ada0aa6
|
|
| MD5 |
b46a640107250ea3b5f2abf759095ee4
|
|
| BLAKE2b-256 |
5d94f4124b99a8f98af380a11068f1fbda19c68e4fba10f3e7a2ce159d834d60
|
Provenance
The following attestation bundles were made for voc4cat-1.0.3.tar.gz:
Publisher:
pypi-publish.yml on nfdi4cat/voc4cat-tool
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
voc4cat-1.0.3.tar.gz -
Subject digest:
c2d9e432429953750056c301360e5d5148b0e913f49fdf2e264c163a8ada0aa6 - Sigstore transparency entry: 953218990
- Sigstore integration time:
-
Permalink:
nfdi4cat/voc4cat-tool@2c48b6c4f6c41bd0c355f3cdf2bba877c561b90f -
Branch / Tag:
refs/tags/v1.0.3 - Owner: https://github.com/nfdi4cat
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi-publish.yml@2c48b6c4f6c41bd0c355f3cdf2bba877c561b90f -
Trigger Event:
release
-
Statement type:
File details
Details for the file voc4cat-1.0.3-py3-none-any.whl.
File metadata
- Download URL: voc4cat-1.0.3-py3-none-any.whl
- Upload date:
- Size: 150.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c29a7b8d94ba309a92086c567c1d09a5126245c8e29d7536a4b7f36f65cbdc6c
|
|
| MD5 |
b196daf9a337c2cb6f8ffa50f530b294
|
|
| BLAKE2b-256 |
3da3bc14c66e768bbf0c186f2f81240c1cc59994c154a607026df9ba22d77221
|
Provenance
The following attestation bundles were made for voc4cat-1.0.3-py3-none-any.whl:
Publisher:
pypi-publish.yml on nfdi4cat/voc4cat-tool
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
voc4cat-1.0.3-py3-none-any.whl -
Subject digest:
c29a7b8d94ba309a92086c567c1d09a5126245c8e29d7536a4b7f36f65cbdc6c - Sigstore transparency entry: 953218992
- Sigstore integration time:
-
Permalink:
nfdi4cat/voc4cat-tool@2c48b6c4f6c41bd0c355f3cdf2bba877c561b90f -
Branch / Tag:
refs/tags/v1.0.3 - Owner: https://github.com/nfdi4cat
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi-publish.yml@2c48b6c4f6c41bd0c355f3cdf2bba877c561b90f -
Trigger Event:
release
-
Statement type: