Tag and push datasets from iRODS to a dataverse installation
Project description
iRODS-Dataverse
This is an implementation for programmatically creating a draft dataset publication from data stored in iRODS into a configured Dataverse installation. The final submission of the dataset takes place in the Dataverse installation itself, since additional steps may be required (e.g. submit dataset to review).
Prerequisites
-
Being an iRODS user with data in an iRODS zone.
-
Have a Dataverse account, in one of the configured installations (currently Demo, RDR or RDR-pilot).
- Sign up with individual account.
- Get the API Token which is valid for a certain amount of time (e.g. in Demo the API Token is valid for one year)
-
Set up the virtual environment:
python -m venv venv source venv/bin/activate pip install irods2dataverse
When the process is finished, deactivate the virtual environment:
deactivate
User script
After installing the package in the virtual environment start the process:
python -m irods2dataverse.userScript
This will trigger an interactive terminal that will take you through the following steps:
-
Authenticate to iRODS. For KU Leuven users this happens automatically by reading your local
irods_environment.json. -
Identify the data object(s) to send to Dataverse. There are two possibilities:
- Tag the data objects with metadata attribute
dv.publicationand valueinitiated. - Provide the absolute path(s) of the data object(s) to be sent to Dataverse. The input paths refer either to a single data object
/zone/home/collection/file, or a list of objects["/zone/home/collection/file_1", "/zone/home/collection/file_2"].
-
Identify the target Dataverse installation. The script goes through the selected data object(s) and retrieves the metadata field
dv.installation. If it is not valid or missing, input it from a selection. -
Authenticate to the Dataverse installation. The script will ask you to input your API Token.
-
Gather the metadata needed to create a draft in the selected Dataverse installation. There are three possibilities:
-
(For ManGO users) Use a metadata schema: The schema can be used to add the metadata to any object of the list. One object suffices.
-
Provide the metadata via the CLI: The script asks to provide the value for each required metadata field.
-
Fill in a JSON and provide the path to the file: Copy the metadata template of the selected Dataverse installation, e.g. Demo template and fill it in. Alternatively, create a shorter JSON file with the minimal metadata. For example, the text below shows the contents of the short JSON file, with metadata for the Demo installation:
{ "author": { "authorAffiliation": "My university", "authorName": "Surname, Given Name" }, "datasetContact": { "datasetContactEmail": "username@domain.edu", "datasetContactName": "Surname, Given Name" }, "dsDescription": [ { "dsDescriptionValue": "This is the first dataset I send from iRODS" } ], "subject": [ "Demo Only" ], "title": "My dataset" }
For RDR, the short JSON file would have, for example, the following contents:
{ "access": { "accessRights": "open", "dateAvailable": "", "legitimateOptout": "other" }, "author": [ { "authorAffiliation": "My university", "authorName": "Surname, Given Name" } ], "datasetContact": [ { "datasetContactEmail": "username@domain.edu", "datasetContactName": "Surname, Given Name" } ], "dsDescription": [ { "dsDescriptionValue": "This is the first dataset I send from iRODS" } ], "keyword": [ { "keywordValue": "required-keyword" } ], "technicalFormat": "json", "title": "My dataset" }
To work with the short JSON file, copy the text above and adapt the values into a text file.
Note: For the RDR long template, when the access rights are open, omit the fields regarding available date and legitimate opt-out.
-
The script validates the metadata.
-
The script deposits the draft with its metadata in the selected Dataverse installation. The data objects are directly uploaded to S3 without download.
-
The script updates the metadata of the data objects send to Dataverse with the DOI provided by Dataverse.
Configuring another Dataverse installation
If you want to configure this script to work with other Dataverse installations, look at the custom classes or contact us.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file irods2dataverse-0.0.4.tar.gz.
File metadata
- Download URL: irods2dataverse-0.0.4.tar.gz
- Upload date:
- Size: 23.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
262b26549e07b14316ea501a1d50d49267de2f0c781b974d07182b63d6d12350
|
|
| MD5 |
43622509ea137232aaaa2830b36e7d9f
|
|
| BLAKE2b-256 |
33d218711c25812bdd424ffedf6298a14926d18ed0ce4c83cc80f84303f0e3d6
|
Provenance
The following attestation bundles were made for irods2dataverse-0.0.4.tar.gz:
Publisher:
python-publish.yml on kuleuven/iRODS-Dataverse
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
irods2dataverse-0.0.4.tar.gz -
Subject digest:
262b26549e07b14316ea501a1d50d49267de2f0c781b974d07182b63d6d12350 - Sigstore transparency entry: 231112506
- Sigstore integration time:
-
Permalink:
kuleuven/iRODS-Dataverse@d0b6675067f0eb34f4235d140a8a2d95025d3b6b -
Branch / Tag:
refs/tags/v0.0.4 - Owner: https://github.com/kuleuven
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@d0b6675067f0eb34f4235d140a8a2d95025d3b6b -
Trigger Event:
release
-
Statement type:
File details
Details for the file irods2dataverse-0.0.4-py3-none-any.whl.
File metadata
- Download URL: irods2dataverse-0.0.4-py3-none-any.whl
- Upload date:
- Size: 22.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
913e32f610162b21b267e3793fbeebb23fce063686241341262bd20bdb9ea148
|
|
| MD5 |
2be6f15036afe6003eab14f12057f8e3
|
|
| BLAKE2b-256 |
7d4eb78b876c725cc7c9dcef4fd769ff1a80e325f9625feccda909f301ec3d15
|
Provenance
The following attestation bundles were made for irods2dataverse-0.0.4-py3-none-any.whl:
Publisher:
python-publish.yml on kuleuven/iRODS-Dataverse
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
irods2dataverse-0.0.4-py3-none-any.whl -
Subject digest:
913e32f610162b21b267e3793fbeebb23fce063686241341262bd20bdb9ea148 - Sigstore transparency entry: 231112507
- Sigstore integration time:
-
Permalink:
kuleuven/iRODS-Dataverse@d0b6675067f0eb34f4235d140a8a2d95025d3b6b -
Branch / Tag:
refs/tags/v0.0.4 - Owner: https://github.com/kuleuven
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@d0b6675067f0eb34f4235d140a8a2d95025d3b6b -
Trigger Event:
release
-
Statement type: