Skip to main content

iBridges plugin for interacting with Dataverse

Project description

iBridges Dataverse

Python package

This package provides a plugin to the iBridges CLI and GUI. The plugin allows to create a dataset on Dataverse and to uplaod iRODS data objects.

Dependencies

  • Python 3.9 or higher
  • HTTPX
  • PyDataverse
  • iBridges and iBridges GUI (ibridges, ibridgesgui)

All packages are installable with pip.

๐Ÿ’ก Highlights

  • Checksum checks upon upload to Dataverse
  • Mark files for upload as you browse and push to Dataverse later --> Git-like workflow
  • GUI and CLI receive the same information, so you can switch between them without loosing information

:warning: NOTE

  • The plugin only transfers files which are smaller than 9GB!

  • All data from iRODS will be downloaded to your local computer.

    The files will be downloaded one by one to a temporary folder. From there they are uploaded to Dataverse and are finally deleted from your computer. So you need at least 9GB of extra space. In case of errors in the upload to Dataverse, the file will not be deleted from them temporary folder so that files might accumulate on your computer.

Install the plugin

Install the package from pypi:

pip install ibridgesdvn

Install the CLI from the git repository:

pip install git+https://github.com/iBridges-for-iRODS/ibridges-plugin-dataverse.git

Install the GUI and CLI:

pip install git+https://github.com/iBridges-for-iRODS/ibridges-plugin-dataverse.git
pip install "ibridgesdvn[gui]"

This installs the python package ibridgesdvn.

When you start the iBridges CLI or shell you will see the following new commands:

ibridges -h

    dv-add-file         Mark one or more iRODS data objects to be uploaded to a Dataverse dataset.
    dv-setup            Print existing Dataverse configurations or create new ones.
    dv-cleanup          Cleanup all entries from the status, where the list of irods files is empty.
    dv-create-ds        Create a new dataset in a Dataverse collection.
    dv-init             Provide token and store for future use
    dv-push             Push all local changes to the dataverse collection.
    dv-rm-file          Remove one or more iRODS data objects from upload to a Dataverse dataset.
    dv-status           List all local changes to the dataset(s).
    dv-switch           Switch to another existing Dataverse configuration by providing a url or alias.

If you use the iBridges GUI you will be able to select a "Dataverse" view.

The Dataverse View

  • Step 1: Configure your Dataverse Connection You can add new configurations or delete old ones.
  • Step 2: Select a Dataverse collection to which files should be uploaded. If you do not have a dataset yet, use the "Create New Dataset" button and get a DOI to a newly created dataset.
  • Step 3: Select iRODS data objects from the right pane of the view. With the button "<<" they will be added to the list of data that will be uploaded. You can also delete one or several entries from the list.

When you are happy with the list, click "Upload to Dataverse". Now go to your Dataverse dataset in the browser and finish you data publication.

The Dataverse commands

Configuring a Dataverse instance

With the command ibridges dv-setup you can see all existing Dataverse URLs and you can create an alias for a Dataverse URL.

ibridges dv-setup dvnl-demo https://demo.dataverse.nl

To activate this environment you need to provide an API token:

ibridges dv-init dvnl-demo
Your Dataverse token for dvnl-demo :
  demo -> https://demo.dataverse.org
* dvnl-demo -> https://demo.dataverse.nl

Your token will not be shown, but stored for future use.

You see that there are two Dataverse URLs. You can switch between them with:

ibridges dv-switch https://demo.dataverse.org
* demo -> https://demo.dataverse.org
  dvnl-demo -> https://demo.dataverse.nl

In all of those commands you can use the URLs and the aliases interchangeably.

Note, that those commands are only available for the CLI and not for the shell. All other commands can be used from the shell.

Creating a dataset

To create a dataset, you will need a specific Dataverse dataset.json which you can provide like this:

ibridges shell
ibshell:research-christine> dv-create-ds UUscience --metajson ibridgescontrib/ibridgesdvn/dataset.json
Dataset with pid 'doi:10.80227/PDVNL/RZQRAK' created.

This creates a dataset on the Dataverse we selected with dv-init or dv-switch, it uses the Dataverse collection UUscience. Please adjust to your Dataverse instance. You will need the 10.80227/PDVNL/RZQRAK part of the pid to add files and to finally upload them to Dataverse, so please save it somewhere.

Alternatively, you can create some minimal metadata with the command:

ibshell:research-christine> dv-create-ds UUscience --metadata

This will open a small interactive questionnaire for the most basic metadata that Dtaaverse needs to create a Dataset.

Browsing files and adding it to Dataverse

After you created the new Dataset on Dataverse, you can now use the iBridges shell to browse and add files as you go along.

We implemented a git-like workflow. So the following command will only mark files to be uploaded to a specific Dataset on Dataverse.

Browse through collections in iRODS

ibshell:research-christine> ls my_books
/uu/home/research-christine/my_books:
  /uu/home/research-christine/my_books/AdventuresSherlockHolmes.txt
  /uu/home/research-christine/my_books/AliceAdventuresInWonderLand.txt
  /uu/home/research-christine/my_books/DonQuixote.txt
  /uu/home/research-christine/my_books/Dracula.txt

And select files as you go along:

ibshell:research-christine> dv-add-file 10.80227/PDVNL/RZQRAK irods:my_books/AdventuresSherlockHolmes.txt irods:my_books/DonQuixote.txt

In the iBridges CLI and shell you can use relative iRODS paths. All iRODS paths need to be prefixed with irods:.

You will receive a short summary of all files that you ever marked for upload. This summary also contains all Datavers instances you work on and all Datasets:

{'https://demo.dataverse.nl': {'add_file': [{'dataset': '10.80227/PDVNL/RZQRAK',
                                             'irods_paths': ['/uu/home/research-christine/my_books/AdventuresSherlockHolmes.txt',
                                                             '/uu/home/research-christine/my_books/DonQuixote.txt']}]}}

You can also remove iRODS paths again with:

ibshell:research-christine> dv-rm-file 10.80227/PDVNL/RZQRAK irods:my_books/DonQuixote.txt

And with dv-status you can check which files are marked for upload:

ibshell:research-christine> dv-status
{'https://demo.dataverse.nl': {'add_file': [{'dataset': '10.80227/PDVNL/RZQRAK',
                                             'irods_paths': ['/uu/home/research-christine/my_books/AdventuresSherlockHolmes.txt']}]}}

Upload data to dataset

Now that you are happy with the marked data, upload them to the dataset:

ibshell:research-christine> dv-push 10.80227/PDVNL/RZQRAK

Data stored in /Users/staig001/.dvn/data
/uu/home/research-christine/my_books/AdventuresSherlockHolmes.txt
100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 580k/580k [00:00<00:00, 1.01MB/s]
Downloaded /uu/home/research-christine/my_books/AdventuresSherlockHolmes.txt --> /Users/staig001/.dvn/data/AdventuresSherlockHolmes.txt
{'pid': 'doi:dataset_id', 'filename': 'AdventuresSherlockHolmes.txt'}
Uploaded /Users/staig001/.dvn/data/AdventuresSherlockHolmes.txt --> 10.80227/PDVNL/RZQRAK

A successful download and upload will trigger that those files will be removed from the status.

ibshell:research-christine> dv-status
{'https://demo.dataverse.nl': {'add_file': [{'dataset': '10.80227/PDVNL/RZQRAK',
                                             'irods_paths': []}]}}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ibridgesdvn-1.0.4.tar.gz (767.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ibridgesdvn-1.0.4-py3-none-any.whl (57.9 kB view details)

Uploaded Python 3

File details

Details for the file ibridgesdvn-1.0.4.tar.gz.

File metadata

  • Download URL: ibridgesdvn-1.0.4.tar.gz
  • Upload date:
  • Size: 767.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for ibridgesdvn-1.0.4.tar.gz
Algorithm Hash digest
SHA256 696289d1afc78c24683c927b73aa4a7da7a5f0639576172448f53f8d2b7a804a
MD5 872944577c918234577db6c3d38ee85f
BLAKE2b-256 6841357e5a9a4f23a28f6aa4664dc4d4d4774dd0e19bda223b60fca91b07cedc

See more details on using hashes here.

File details

Details for the file ibridgesdvn-1.0.4-py3-none-any.whl.

File metadata

  • Download URL: ibridgesdvn-1.0.4-py3-none-any.whl
  • Upload date:
  • Size: 57.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for ibridgesdvn-1.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 ca1b59550862c30743953916a119fca1486776a8fb83afbdc9e9ae508c656d26
MD5 aa10fc48a09f080fb9e995b4be17f7d3
BLAKE2b-256 b6ea6549d4ea5012cdc96d02052be83d33db87e3cf49e92714c74b3e351c7006

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page