Skip to main content

Big Data Bag Utilities

Project description

bdbag

Build Status Coverage Status PyPi Version PyPi Downloads PyPi Wheel Anaconda-Server Badge Anaconda-Server Badge Python Versions License

Big Data Bag Utilities

The bdbag utilities are a collection of software programs for working with BagIt packages that conform to the BDBag and Bagit/RO profiles.

The bdbag profiles specify the use of the fetch.txt file, require serialization, and specify what manifests must be provided with a bdbag.

The bdbag utilities incorporate functions from various other Python-based bagit components (such as the Bagit-Python bag creation utility and the Bagit-Profiles-Validator utility) and wraps them in a single, easy to use software package with additional features.

Enhanced bag support includes:

  • Update-in-place functionality for existing bags.
  • Automatic archiving and extraction of bags using ZIP, TAR, and TGZ formats.
  • Automatic generation of file manifest entries and fetch.txt for remote files via configuration file.
  • Automatic file retrieval based on the contents of a bag's fetch.txt file with multiple protocol support. Transport handlers for http(s),ftp,s3,gs, and globus are provided, along with an extensibility mechanism for adding externally developed transports.
  • Built-in bagit-profile validation.
  • Built-in support for creation of bags with Bagit/RO profile compatibility.

An experimental Graphical User Interface (GUI) for bdbag can be found here.

Technical Papers

"I'll take that to go: Big data bags and minimal identifiers for exchange of large, complex datasets" explains the motivation for BDBags and the related Minid construct, provides details on design and implementation, and gives examples of use.

"Reproducible big data science: A case study in continuous FAIRness" presents a data analysis use case in which BDBags and Minids are used to capture a transcription factor binding site analysis.

Dependencies

  • Python 2.7 is the minimum Python version required.
  • The code and dependencies are also compatible with Python 3, versions 3.5 through 3.9.

Installation

The latest bdbag release is available on PyPi and can be installed using pip:

pip install bdbag

Note that the above command will install bdbag with only the minimal dependencies required to run. If you wish to install bdbag with the extra fetch transport handler support provided by boto (for AWS S3) and globus (for Globus Transfer) packages, use the following command:

pip install bdbag[boto,globus]

Installation from Source

You can use pip to install bdbag directly from GitHub:

sudo pip install git+https://github.com/fair-research/bdbag

or:

pip install --user git+https://github.com/fair-research/bdbag

You can also download the current bdbag source code from GitHub or alternatively clone the source from GitHub if you have git installed:

git clone https://github.com/fair-research/bdbag

From the root of the bdbag source code directory execute the following command:

sudo pip install .

or:

pip install --user .

Note that if you want to install the extra dependencies from a local source directory you would use the following command:

pip install .[boto,globus]

Testing

The unit tests can be run by invoking the following command from the root of the bdbag source code directory:

python setup.py test

Usage

This software can be used from the command-line environment by running the bdbag script. For detailed usage instructions, see the CLI Guide.

Configuration

Some components of the bdbag software can be configured via JSON-formatted configuration files. See the Configuration Guide for further details.

Application Programming Interface

It is also possible to use bdbag from within other Python programs via an API. See the API Guide for further details.

Utilities

A CLI utility module is provided for various ancillary tasks commonly involved with authoring bdbags. See the Utility Guide for further details.

Change Log

The change log is located here.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bdbag-1.6.2.tar.gz (1.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bdbag-1.6.2-py2.py3-none-any.whl (70.1 kB view details)

Uploaded Python 2Python 3

File details

Details for the file bdbag-1.6.2.tar.gz.

File metadata

  • Download URL: bdbag-1.6.2.tar.gz
  • Upload date:
  • Size: 1.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.7.0 requests/2.25.1 setuptools/53.0.0 requests-toolbelt/0.9.1 tqdm/4.56.2 CPython/3.7.7

File hashes

Hashes for bdbag-1.6.2.tar.gz
Algorithm Hash digest
SHA256 ed0d6cde78def32ffdc15200ac47c529df0d729367aba05b9bf0ef81a1ff94e6
MD5 a9bdac4ae1cc90114fbdce01ff1c015a
BLAKE2b-256 7485fc68a947978a0b078d819ddbe02e5a97ce1f7bd07ec5500fab30f7d4079e

See more details on using hashes here.

File details

Details for the file bdbag-1.6.2-py2.py3-none-any.whl.

File metadata

  • Download URL: bdbag-1.6.2-py2.py3-none-any.whl
  • Upload date:
  • Size: 70.1 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.7.0 requests/2.25.1 setuptools/53.0.0 requests-toolbelt/0.9.1 tqdm/4.56.2 CPython/3.7.7

File hashes

Hashes for bdbag-1.6.2-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 2821748ebe98485edb7f4da2954e1dc1923d4cf9967e35954e882b00b34111d0
MD5 a230d8d089a38a54222a403ebdf87e83
BLAKE2b-256 10675c16db7c84a04db44dc5f826291b7b9945a29a42303e5b44a4072f253cea

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page