Skip to main content

RO-Crate metadata generator/parser

Project description

Python package Upload Python Package PyPI version DOI

ro-crate-py

Python library to create/parse RO-Crate (Research Object Crate) metadata.

Supports specification: RO-Crate 1.1

Status: Alpha

Installing

You will need Python 3.6 or later (Recommended: 3.7).

This library is easiest to install using pip:

pip install rocrate

If you want to install manually from this code base, then try:

pip install .

..or if you use don't use pip:

python setup.py install

General usage

The RO-crate object

In general you will want to start by instantiating the ROCrate object. This can be a new one:

from rocrate.rocrate import ROCrate

crate = ROCrate() 

or an existing RO-Crate package can be loaded from a directory or zip file:

crate = ROCrate('/path/to/crate/')
crate = ROCrate('/path/to/crate/file.zip')

In addition, there is a set of higher level functions in the form of an interface to help users create some predefined types of crates. As an example here is the code to create a workflow RO-Crate, containing a workflow template. This is a good starting point if you want to wrap up a workflow template to register at workflowhub.eu:

from rocrate import rocrate_api

wf_path = "test/test-data/test_galaxy_wf.ga"
files_list = ["test/test-data/test_file_galaxy.txt"]

# Create base package
wf_crate = rocrate_api.make_workflow_rocrate(workflow_path=wf_path,wf_type="Galaxy",include_files=files_list)

Independently of the initialization method, once an instance of ROCrate is created it can be manipulated to extend the content and metadata.

Data entities

Data entities can be added with:

## adding a File entity:
sample_file = '/path/to/sample_file.txt'
file_entity = crate.add_file(sample_file)

# Adding a File entity with a reference to an external (absolute) URI
remote_file = crate.add_file('https://github.com/ResearchObject/ro-crate-py/blob/master/test/test-data/test_galaxy_wf.ga', fetch_remote = False)

# adding a Dataset
sample_dir = '/path/to/dir'
dataset_entity = crate.add_directory(sample_dir, 'relative/rocrate/path')

Contextual entities

Contextual entities are used in an RO-Crate to adequately describe a Data Entity. The following example shows how to add the person contextual entity to the RO-Crate root:

from rocrate.model.person import Person

# Add authors info
crate.add(Person(crate, '#joe', {'name': 'Joe Bloggs'}))

# wf_crate example
publisher = Person(crate, '001', {'name': 'Bert Verlinden'})
creator = Person(crate, '002', {'name': 'Lee Ritenour'})
wf_crate.add(publisher, creator)

# These contextual entities can be assigned to other metadata properties:

wf_crate.publisher = publisher
wf_crate.creator = [ creator, publisher ]

Other metadata

Several metadata fields on root level are supported for the workflow RO-crate:

wf_crate.license = 'MIT'
wf_crate.isBasedOn = "https://climate.usegalaxy.eu/u/annefou/w/workflow-constructed-from-history-climate-101"
wf_crate.name = 'Climate 101'
wf_crate.keywords = ['GTN', 'climate']
wf_crate.image = "climate_101_workflow.svg"
wf_crate.description = "The tutorial for this workflow can be found on Galaxy Training Network"
wf_crate.CreativeWorkStatus = "Stable"

Writing the RO-crate file

In order to write the crate object contents to a zip file package or a decompressed directory, there are 2 write methods that can be used:

# Write to zip file
out_path = "/home/test_user/crate"
crate.write_zip(out_path)

# write crate to disk
out_path = "/home/test_user/crate_base"
crate.write(out_path)

Command Line Interface

ro-crate-py includes a hierarchical command line interface: the rocrate tool. rocrate is the top-level command, while specific functionalities are provided via sub-commands. Currently, the tool allows to initialize a directory tree as an RO-Crate (rocrate init) and to modify the metadata of an existing RO-Crate (rocrate add).

$ rocrate --help
Usage: rocrate [OPTIONS] COMMAND [ARGS]...

Options:
  -c, --crate-dir PATH
  --help                Show this message and exit.

Commands:
  add
  init

Commands act on the current directory, unless the -c option is specified.

The rocrate init command explores a directory tree and generates an RO-Crate metadata file (ro-crate-metadata.json) listing all files and directories as File and Dataset entities, respectively. The metadata file is added (overwritten if present) to the directory at the top-level, turning it into an RO-Crate.

The rocrate add command allows to add workflows and other entity types (currently testing-related metadata) to an RO-Crate. The entity type is specified via another sub-command level:

# rocrate add --help
Usage: rocrate add [OPTIONS] COMMAND [ARGS]...

Options:
  --help  Show this message and exit.

Commands:
  test-definition
  test-instance
  test-suite
  workflow

Note that data entities (e.g., workflows) must already be present in the directory tree: the effect of the command is to register them in the metadata file.

Example

# From the ro-crate-py repository root
cd test/test-data/ro-crate-galaxy-sortchangecase

This directory is already an ro-crate. Delete the metadata file to get a plain directory tree:

rm ro-crate-metadata.json

Now the directory tree contains several files and directories, including a Galaxy workflow and a Planemo test file, but it's not an RO-Crate since there is no metadata file. Initialize the crate:

rocrate init

This creates an ro-crate-metadata.json file that lists files and directories rooted at the current directory. Note that the Galaxy workflow is listed as a plain File:

        {
            "@id": "sort-and-change-case.ga",
            "@type": "File"
        }

To register the workflow as a ComputationalWorkflow:

rocrate add workflow -l galaxy sort-and-change-case.ga

Now the workflow has a type of ["File", "SoftwareSourceCode", "ComputationalWorkflow"] and points to a ComputerLanguage entity that represents the Galaxy workflow language. Also, the workflow is listed as the crate's mainEntity (see https://about.workflowhub.eu/Workflow-RO-Crate).

To add workflow testing metadata to the crate:

rocrate add test-suite -i \#test1
rocrate add test-instance \#test1 http://example.com -r jobs -i \#test1_1
rocrate add test-definition \#test1 test/test1/sort-and-change-case-test.yml  -e planemo -v '>=0.70'

License

  • Copyright 2019-2021 The University of Manchester, UK
  • Copyright 2021 Vlaams Instituut voor Biotechnologie (VIB), BE
  • Copyright 2021 Barcelona Supercomputing Center (BSC), ES
  • Copyright 2021 Center for Advanced Studies, Research and Development in Sardinia (CRS4), IT

Licensed under the Apache License, version 2.0 https://www.apache.org/licenses/LICENSE-2.0, see the file LICENSE.txt for details.

Cite as

DOI

The above DOI corresponds to the latest versioned release as published to Zenodo, where you will find all earlier releases. To cite ro-crate-py independent of version, use https://doi.org/10.5281/zenodo.3956493, which will always redirect to the latest release.

You may also be interested in the paper Packaging research artefacts with RO-Crate to appear in Data Science.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rocrate-0.5.1.tar.gz (288.9 kB view details)

Uploaded Source

Built Distribution

rocrate-0.5.1-py3-none-any.whl (314.8 kB view details)

Uploaded Python 3

File details

Details for the file rocrate-0.5.1.tar.gz.

File metadata

  • Download URL: rocrate-0.5.1.tar.gz
  • Upload date:
  • Size: 288.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.10.0

File hashes

Hashes for rocrate-0.5.1.tar.gz
Algorithm Hash digest
SHA256 e1f33aff8d3a99fee3ceaa1f9314c7f161708d1914e0f22e5fe9968183452eef
MD5 9c6392671d91be516d51b8d441c01225
BLAKE2b-256 9667697a10d8629bebf42973aea6f91de1c2e168f1b64cab21fc8b092bea57a5

See more details on using hashes here.

File details

Details for the file rocrate-0.5.1-py3-none-any.whl.

File metadata

  • Download URL: rocrate-0.5.1-py3-none-any.whl
  • Upload date:
  • Size: 314.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.10.0

File hashes

Hashes for rocrate-0.5.1-py3-none-any.whl
Algorithm Hash digest
SHA256 0d5b483fc1fb5fe477a86d8d030f356485fc3f1246525349e3dbaeb000106b9d
MD5 5896e9917681849823f3d7eeee28ee00
BLAKE2b-256 39d6880b23ca334e540674724e79126291cb4a6e8f3b35b3db1d9b0f934e9c44

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page