Skip to main content

Big Data Bag Utilities

Project description

bdbag

Build Status Coverage Status PyPi Version PyPi Wheel Python Versions License

Big Data Bag Utilities

The bdbag utilities are a collection of software programs for working with BagIt packages that conform to the Bagit and Bagit/RO profiles.

The bdbag profiles specify the use of the fetch.txt file, require serialization, and specify what manifests must be provided with a bdbag.

These utilities combine various other components such as the Bagit-Python bag creation utility and the Bagit-Profiles-Validator utility into a single, easy to use software package.

Enhanced bag support includes:

  • Update-in-place functionality for existing bags.
  • Automatic archiving and extraction of bags using ZIP, TAR, and TGZ formats.
  • Automatic generation of remote file manifest entries and fetch.txt via configuration file.
  • Automatic file retrieval based on the contents of a bag's fetch.txt file with multiple protocol support.
  • Built-in profile validation.
  • Built-in support for creation of bags with Bagit/RO profile compatibility.

An experimental Graphical User Interface (GUI) for bdbag can be found here.

Technical Papers

"I'll take that to go: Big data bags and minimal identifiers for exchange of large, complex datasets" explains the motivation for BDBags and the related Minid construct, provides details on design and implementation, and gives examples of use.

"Reproducible big data science: A case study in continuous FAIRness" presents a data analysis use case in which BDBags and Minids are used to capture a transcription factor binding site analysis.

Dependencies

  • Python 2.7 is the minimum Python version required.
  • The code and dependencies are also compatible with Python 3, versions 3.3 through 3.6.

Installation

The latest bdbag release is available on PyPi and can be installed using pip:

pip install bdbag

Installation from Source

Download the current bdbag source code from GitHub or alternatively clone the source from GitHub if you have git installed:

git clone https://github.com/fair-research/bdbag

From the root of the bdbag source code directory execute the following command:

python setup.py install --user

Note that if you want to make bdbag available to all users on the system, you should run the following command:

python setup.py install

If you are on a Unix-based system (including MacOSX) you should execute the above command as root or use sudo.

Testing

The unit tests can be run by invoking the following command from the root of the bdbag source code directory:

python setup.py test

Usage

This software can be used from the command-line environment by running the bdbag script. For detailed usage instructions, see the CLI guide.

Configuration

Some components of the bdbag software can be configured via JSON-formatted configuration files. See the Configuration guide for further details.

Application Programming Interface

It is also possible to use bdbag from within other Python programs via an API. See the API guide for further details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bdbag-1.3.0.tar.gz (36.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bdbag-1.3.0-py2.py3-none-any.whl (49.5 kB view details)

Uploaded Python 2Python 3

File details

Details for the file bdbag-1.3.0.tar.gz.

File metadata

  • Download URL: bdbag-1.3.0.tar.gz
  • Upload date:
  • Size: 36.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for bdbag-1.3.0.tar.gz
Algorithm Hash digest
SHA256 981ca84523b8476e3236fe77b3df2382ba78e771f14ce5d21ae5584c7551c76e
MD5 5e71da7f75f5b828b50a46f8ec4bbcf6
BLAKE2b-256 8639e0214dfad4b5882cc8bacb2c10b441a15f68301430074087127da0448cb3

See more details on using hashes here.

File details

Details for the file bdbag-1.3.0-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for bdbag-1.3.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 b9fcbe13934ee742715571a0e186c538a813ae2fc72876eeccda8c55a6b82180
MD5 52958d1e7246ad0b93a151657aeedb14
BLAKE2b-256 50d271398f39caac2519beda49a80c143595ab5cbab25b3a08847045b1c788dd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page