Skip to main content

This tool generates RDF metadata from medical imaging respositories

Project description

img2catalog: From the shelves to the spotlight

Python Version from PEP 621 TOML GitHub Actions Workflow Status Codecov

This tool img2catalog is built to extract metadata from imaging data repositories, map it to a model class, and export it to a catalog. The tool is set up to be modular, but it currently supports extracting metadata from an XNAT server, converting it to the Health-RI Core v2 metadata model, based on DCAT-AP v3 and Health-DCAT AP, as defined in Pydantic classes using SeMPyRO, and either writing it to an RDF file, or pushing it to a FAIR Data Point (FDP).

Installation

img2catalog requires an installation of Python 3.8 or higher. It can be installed by running

pip install img2catalog

Usage

img2catalog consists of:

  • an input,
  • a mapping,
  • and an output.

A basic example:

img2catalog xnat --server https://xnat.bmia.nl map-xnat-hriv2 rdf

In this example we use the input xnat, that connects to the server https://xnat.health-ri.nl, uses the mapping map-xnat-hriv2 that maps the extracted metadata to the Health-RI Core v2 metadata model, and serializes that to RDF and outputs it to the terminal using the output rdf.

In this mapping the XNAT itself will be converted to a Catalog object, and the projects to Datasets.

Pushing to a FAIR Data Point (FDP)

Using img2catalog one can directly push the Datasets created from XNAT projects to an existing Catalog on an FDP. To do so, run the following command with the output fdp:

img2catalog xnat --server https://xnat.bmia.nl map-xnat-hriv2 fdp --fdp "https://fdp.healthdata.nl" -u "albert.einstein@example.com" -p "password" -c "https://fdp-acc.healthdata.nl/catalog/5400322c-273c-4f47-ae30-00e7c345b85d"

This will add the new Datasets to the Catalog. In order to update the Datasets on the FAIR Data Point when rerunning img2catalog, it is necessary to first perform a SPARQL query on the GraphDB, or another triple store, that contains the metadata stored in your FDP. To do so, supply the SPARQL endpoint as an argument to the fdp output.

img2catalog xnat --server https://xnat.bmia.nl map-xnat-hriv2 fdp --fdp "https://fdp.healthdata.nl" -u "albert.einstein@example.com" -p "password" -c "https://fdp-acc.healthdata.nl/catalog/5400322c-273c-4f47-ae30-00e7c345b85d" -s "https://sparql-acc.healthdata.nl/repositories/fdp"

Configuration

A number of configuration option are available through the command line interface (CLI). To get an overview of these options, run img2catalog --help and on any subsequent submodules, e.g., img2catalog xnat.

An example configuration file config.toml is supplied with this project. By default, img2catalog will use the configuration file ~/.img2catalog/config.toml, if it exists. If the file does not exist, a default configuration will be used.

Currently, it is not possible to gather all the information for the Health-RI v2 model from a regular XNAT project. Additional properties can be stored in XNAT using XNAT Custom Forms. A JSON definition for the form for the Health-RI v2.0.0 model can be found in ./ext/xnat_custom_forms/health-ri-v2-dataset.json. This form can be attached to a project and the information can be retrieved by supplying the form ID in the configuration, like so:

[xnat]
dataset_form_id = "48660455-b964-4aef-b293-fbc1fab96bc0"

The metadata can be supplemented by defining fallback values in the configuration file.

Environment Variables

The tool can also be configured using environment variables. Here are the environment variables that can be used:

  • XNAT_HOST: The XNAT server host.
  • XNATPY_HOST: Alternative environment variable for the XNAT server host.
  • XNAT_USER: The XNAT username.
  • XNAT_PASS: The XNAT password.
  • IMG2CATALOG_FDP: The FDP server.
  • IMG2CATALOG_FDP_USER: The FDP username.
  • IMG2CATALOG_FDP_PASS: The FDP password.
  • IMG2CATALOG_SPARQL_ENDPOINT: The SPARQL endpoint.

Commandline arguments take precedence over environment variables. Environment variables take precedence over .netrc login.

Authentication

Authentication for XNAT can be done using, in order of precedence:

  • Command line arguments
  • Environment variables
  • .netrc file.

For more information regarding this, see the XNATpy documentation.

Inclusion and exclusion of projects

By default, all public and protected projects are indexed. Since private projects are not shown on XNAT, they will also not be harvested to be represented in a public catalogue.

By specifying either opt-in or opt-out keywords, projects can be included and excluded. If an opt-in keyword is given, only projects with that keyword are included; if an opt-out keyword is given all projects except those with that keyword are included. If none are supplied, all projects will be included; if both opt-in and opt-out keywords are given, then only the opt-in keyword is applied.

Development

This project uses Hatch as a project manager. After cloning the repository, the development version can be run by hatch run img2catalog. Hatch will take care of dependencies and all of that.

You can run unit tests by running hatch run test:test, or get in a shell in the python environment by running hatch shell. Hatch uses whatever Python version is currently loaded. This project is compatible with Python 3.8 and up.

Pull requests are very much welcomed! As long the output remains at least DCAT-AP v3 compliant, we are open to any additions.

Limitations

Currently, title, description, keywords, PI and Investigators are set as well as title, description and publisher of the catalogue. There is no Distribution, Dataset Series or anything else. The language of the fields also is not set.

Disclaimer

Emblem co-funded by the European Union This project is co-funded by the European Union under Grant Agreement 101100633. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Commission. Neither the European Union nor the granting authority can be held responsible for them.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

img2catalog-2.0.0.tar.gz (114.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

img2catalog-2.0.0-py3-none-any.whl (22.3 kB view details)

Uploaded Python 3

File details

Details for the file img2catalog-2.0.0.tar.gz.

File metadata

  • Download URL: img2catalog-2.0.0.tar.gz
  • Upload date:
  • Size: 114.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for img2catalog-2.0.0.tar.gz
Algorithm Hash digest
SHA256 c42d930e0de368db70e7bcfd6f3452bd5a145b27efe0edeae523cbf9bbec8105
MD5 87b8d26d20d60babbfe3d6fb7cf34796
BLAKE2b-256 ab704f1540011919d7928c6db0cac1e8df1ce569c99ed2983146ee57df50223b

See more details on using hashes here.

Provenance

The following attestation bundles were made for img2catalog-2.0.0.tar.gz:

Publisher: publish_package.yml on Health-RI/img2catalog

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file img2catalog-2.0.0-py3-none-any.whl.

File metadata

  • Download URL: img2catalog-2.0.0-py3-none-any.whl
  • Upload date:
  • Size: 22.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for img2catalog-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 76bf4fd9050a2105bae739705663dc54efcc9eb41fc4a345be238170168cf8e7
MD5 e0a095345754f8ad254a1da2bacd884b
BLAKE2b-256 86f08c73a2e67f67fa235449404c1263886916359476f4eadf0160b16e3df8a3

See more details on using hashes here.

Provenance

The following attestation bundles were made for img2catalog-2.0.0-py3-none-any.whl:

Publisher: publish_package.yml on Health-RI/img2catalog

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page