Skip to main content

Tag and push datasets from iRODS to a dataverse installation

Project description

iRODS-Dataverse

This is an implementation for programmatically creating a draft dataset publication from data stored in iRODS into a configured Dataverse installation. The final submission of the dataset takes place in the Dataverse installation itself, since additional steps may be required (e.g. submit dataset to review).

Prerequisites

  1. Being an iRODS user with data in an iRODS zone.

  2. Have a Dataverse account, in one of the configured installations (currently Demo, RDR or RDR-pilot).

    • Sign up with individual account.
    • Get the API Token which is valid for a certain amount of time (e.g. in Demo the API Token is valid for one year)
  3. Set up the virtual environment:

    python -m venv venv
    source venv/bin/activate
    pip install irods2dataverse
    

    When the process is finished, deactivate the virtual environment:

    deactivate
    

User script

After installing the package in the virtual environment start the process:

python -m irods2dataverse.userScript

This will trigger an interactive terminal that will take you through the following steps:

  1. Authenticate to iRODS. For KU Leuven users this happens automatically by reading your local irods_environment.json.

  2. Identify the data object(s) to send to Dataverse. There are two possibilities:

  • Tag the data objects with metadata attribute dv.publication and value initiated.
  • Provide the absolute path(s) of the data object(s) to be sent to Dataverse. The input paths refer either to a single data object /zone/home/collection/file, or a list of objects ["/zone/home/collection/file_1", "/zone/home/collection/file_2"].
  1. Identify the target Dataverse installation. The script goes through the selected data object(s) and retrieves the metadata field dv.installation. If it is not valid or missing, input it from a selection.

  2. Authenticate to the Dataverse installation. The script will ask you to input your API Token.

  3. Gather the metadata needed to create a draft in the selected Dataverse installation. There are three possibilities:

  • (For ManGO users) Use a metadata schema: The schema can be used to add the metadata to any object of the list. One object suffices.

  • Provide the metadata via the CLI: The script asks to provide the value for each required metadata field.

  • Fill in a JSON and provide the path to the file: Copy the metadata template of the selected Dataverse installation, e.g. Demo template and fill it in. Alternatively, create a shorter JSON file with the minimal metadata. For example, the text below shows the contents of the short JSON file, with metadata for the Demo installation:

    {
        "author": {
            "authorAffiliation": "My university",
            "authorName": "Surname, Given Name"
        },
        "datasetContact": {
            "datasetContactEmail": "username@domain.edu",
            "datasetContactName": "Surname, Given Name"
        },
        "dsDescription": [
            {
                "dsDescriptionValue": "This is the first dataset I send from iRODS"
            }
        ],
        "subject": [
            "Demo Only"
        ],
        "title": "My dataset"
    }
    

    For RDR, the short JSON file would have, for example, the following contents:

    {
        "access": {
            "accessRights": "open",
            "dateAvailable": "",
            "legitimateOptout": "other"
        },
        "author": [
            {
                "authorAffiliation": "My university",
                "authorName": "Surname, Given Name"
            }
        ],
        "datasetContact": [
            {
                "datasetContactEmail": "username@domain.edu",
                "datasetContactName": "Surname, Given Name"
            }
        ],
        "dsDescription": [
            {
                "dsDescriptionValue": "This is the first dataset I send from iRODS"
            }
        ],
        "keyword": [
            {
                "keywordValue": "required-keyword"
            }
        ],
        "technicalFormat": "json",
        "title": "My dataset"
    }
    

    To work with the short JSON file, copy the text above and adapt the values into a text file.

    Note: For the RDR long template, when the access rights are open, omit the fields regarding available date and legitimate opt-out.

  1. The script validates the metadata.

  2. The script deposits the draft with its metadata in the selected Dataverse installation. The data objects are directly uploaded to S3 without download.

  3. The script updates the metadata of the data objects send to Dataverse with the DOI provided by Dataverse.

Configuring another Dataverse installation

If you want to configure this script to work with other Dataverse installations, look at the custom classes or contact us.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

irods2dataverse-0.0.4.tar.gz (23.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

irods2dataverse-0.0.4-py3-none-any.whl (22.9 kB view details)

Uploaded Python 3

File details

Details for the file irods2dataverse-0.0.4.tar.gz.

File metadata

  • Download URL: irods2dataverse-0.0.4.tar.gz
  • Upload date:
  • Size: 23.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for irods2dataverse-0.0.4.tar.gz
Algorithm Hash digest
SHA256 262b26549e07b14316ea501a1d50d49267de2f0c781b974d07182b63d6d12350
MD5 43622509ea137232aaaa2830b36e7d9f
BLAKE2b-256 33d218711c25812bdd424ffedf6298a14926d18ed0ce4c83cc80f84303f0e3d6

See more details on using hashes here.

Provenance

The following attestation bundles were made for irods2dataverse-0.0.4.tar.gz:

Publisher: python-publish.yml on kuleuven/iRODS-Dataverse

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file irods2dataverse-0.0.4-py3-none-any.whl.

File metadata

File hashes

Hashes for irods2dataverse-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 913e32f610162b21b267e3793fbeebb23fce063686241341262bd20bdb9ea148
MD5 2be6f15036afe6003eab14f12057f8e3
BLAKE2b-256 7d4eb78b876c725cc7c9dcef4fd769ff1a80e325f9625feccda909f301ec3d15

See more details on using hashes here.

Provenance

The following attestation bundles were made for irods2dataverse-0.0.4-py3-none-any.whl:

Publisher: python-publish.yml on kuleuven/iRODS-Dataverse

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page