Skip to main content

Manage data bundled with bioinformatic software through Zenodo DOI integration

Project description

zenodo_backpack

ZenodoBackpack provides a robust, standardised and repeatable approach to distributing and using backend databases that bioinformatic tools rely on. These databases are usually tool-specific and are often large enough in size that they cannot be uploaded as data to software repositories (e.g. PyPI imposes a limit of ~50MB).

ZenodoBackpack uploads/downloads data to/from Zenodo, which means that each dataset is associated with a DOI. Additionally, it encapsulates the uploaded data in a Zenodo Backpack format, which is really just a CONTENTS.json file and compresses the data in .tar.gz format before upload. The CONTENTS.json file includes md5sum values for each included file for robust verification.

It contains two main methods, which can be accessed through the zenodo_backpack script or accessed as a software library:

create: turns a target directory into a zenodo_backpack-formatted .tar.gz archive with relevant checksum and version information, ready to be uploaded to Zenodo. It is necessary to provide a data version when doing so - furthermore, when uploading this backpack to zenodo.org, the version specified on the website must match that provided when the ZenodoBackpack was created. This allows version tracking and version validation of the data contained within the ZenodoBackpack.

download_and_extract: takes a DOI string to download, extract and verify a zenodo_backpack archive from Zenodo.org to target directory. This returns a ZenodoBackpack object that can be queried for information.

Usage

Command line

You can run zenodo_backpack as a stand-alone program, or import its classes and use them in source code.

In command line, zenodo_backpack can create an archive to be uploaded to Zenodo:

zenodo_backpack create --input_directory <./INPUT_DIRECTORY> --data_version <VERSION> --output_file <./ARCHIVE.tar.gz>

NOTE: it is important that when entering metadata on Zenodo, the version specified MUST match that supplied with --data_version

An uploaded existing zenodo_backpack can be downloaded (--bar if a graphical progress bar is desired) and unpacked as follows:

zenodo_backpack download --doi <MY.DOI/111> --output_directory <OUTPUT_DIRECTORY> --bar

API Usage

You can also import zenodo_backpack as a module:

import zenodo_backpack

Backpacks can be created, downloaded and acquired from a local store:

Create a backpack

Create a new backpack in .tar.gz format containing the payload data folder:

creator = zenodo_backpack.ZenodoBackpackCreator()
creator.create("/path/to/payload_directory", "path/to/archive.tar.gz", "0.1")

Download a backpack

Download a backpack from Zenodo, defined by the DOI. The version is optional, and if not provided, the latest version will be downloaded. If the target file already exists, the download will resume where possible rather than starting from the beginning.:

backpack_downloader = zenodo_backpack.ZenodoBackpackDownloader()
backpack = backpack_downloader.download_and_extract('/path/to/download_directory', 'MY.DOI/111111', version='MY.VERSION')

Read a backpack that is already downloaded

Defined by a path

backpack = zenodo_backpack.acquire(path='/path/to/zenodobackpack/', md5sum=True)

or by environment variable

backpack = zenodo_backpack.acquire(env_var_name='MY_PROGRAM_DB', version="1.5.2")

Working with a backpack

The ZenodoBackpack object returned by acquire and download_and_extract has instance methods to get at the downloaded data. For example, it can return the path to the payload directory within the ZenodoBackpack containing all the payload data:

useful_data_path = zenodo_backpack.acquire(env_var_name='MyZenodoBackpack', version="1.5.2").payload_directory_string()

Installation

zenodo_backpack can be installed from pypi:

pip install zenodo-backpack

The easiest way to install is using conda:

conda install -c conda-forge zenodo_backpack

Alternatively, you can git clone the repository and either run the bin/zenodo_backpack executable or install it with setup tools using

python setup.py install

zenodo_backpack relies on requests and tqdm to display an optional graphical progress bar.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zenodo_backpack-0.4.0.tar.gz (26.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

zenodo_backpack-0.4.0-py3-none-any.whl (23.7 kB view details)

Uploaded Python 3

File details

Details for the file zenodo_backpack-0.4.0.tar.gz.

File metadata

  • Download URL: zenodo_backpack-0.4.0.tar.gz
  • Upload date:
  • Size: 26.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for zenodo_backpack-0.4.0.tar.gz
Algorithm Hash digest
SHA256 5487f374dbeefc1807e6e77bd7d9236ce71393272484f9c036f9fe979ffa9040
MD5 747d33213422eaaa95714e34639a1eb7
BLAKE2b-256 636a7aee5b0f196a3c8c93514fcb79b57fb87475cfc0cb4aad1bb25a9240e273

See more details on using hashes here.

File details

Details for the file zenodo_backpack-0.4.0-py3-none-any.whl.

File metadata

File hashes

Hashes for zenodo_backpack-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3a57c59f45a8b1895a2e20fcd6f93af56f78575ee61ad8d116335a27e05d23af
MD5 4810c2966ed3a9f06c9d0b063f8dfdce
BLAKE2b-256 aeea0aa76326c8f67f0d710f3906b7a2344a0c74d3bd999e157a7e701f857857

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page