Big Data Bag Utilities
Project description
bdbag
Big Data Bag Utilities
The bdbag
utilities are a collection of software programs for working with
BagIt packages that conform to the BDBag and Bagit/RO profiles.
The bdbag
profiles specify the use of the fetch.txt file, require serialization, and specify what manifests must be provided with a bdbag.
These utilities combine various other components such as the Bagit-Python bag creation utility and the Bagit-Profiles-Validator utility into a single, easy to use software package.
Enhanced bag support includes:
- Update-in-place functionality for existing bags.
- Automatic archiving and extraction of bags using ZIP, TAR, and TGZ formats.
- Automatic generation of remote file manifest entries and fetch.txt via configuration file.
- Automatic file retrieval based on the contents of a bag's fetch.txt file with multiple protocol support.
- Built-in profile validation.
- Built-in support for creation of bags with Bagit/RO profile compatibility.
An experimental Graphical User Interface (GUI) for bdbag
can be found here.
Technical Papers
"I'll take that to go: Big data bags and minimal identifiers for exchange of large, complex datasets" explains the motivation for BDBags and the related Minid construct, provides details on design and implementation, and gives examples of use.
"Reproducible big data science: A case study in continuous FAIRness" presents a data analysis use case in which BDBags and Minids are used to capture a transcription factor binding site analysis.
Dependencies
- Python 2.7 is the minimum Python version required.
- The code and dependencies are also compatible with Python 3, versions 3.4 through 3.7.
Installation
The latest bdbag
release is available on PyPi and can be installed using pip
:
pip install bdbag
Note that the above command will install bdbag
with only the minimal dependencies required to run.
If you wish to install bdbag
with the extra fetch transport handler support provided by boto
(for AWS S3)
and globus
(for Globus Transfer) packages, use the following command:
pip install bdbag[boto,globus]
Installation from Source
You can use pip
to install bdbag
directly from GitHub:
sudo pip install git+https://github.com/fair-research/bdbag
or:
pip install --user git+https://github.com/fair-research/bdbag
You can also download the current bdbag
source code from GitHub or
alternatively clone the source from GitHub if you have git installed:
git clone https://github.com/fair-research/bdbag
From the root of the bdbag
source code directory execute the following command:
sudo pip install .
or:
pip install --user .
Note that if you want to install the extra dependencies from a local source directory you would use the following command:
pip install .[boto,globus]
Testing
The unit tests can be run by invoking the following command from the root of the bdbag
source code directory:
python setup.py test
Usage
This software can be used from the command-line environment by running the bdbag
script. For detailed usage
instructions, see the CLI Guide.
Configuration
Some components of the bdbag
software can be configured via JSON-formatted configuration files.
See the Configuration Guide for further details.
Application Programming Interface
It is also possible to use bdbag
from within other Python programs via an API.
See the API Guide for further details.
Utilities
A CLI utility module is provided for various ancillary tasks commonly involved with authoring bdbags. See the Utility Guide for further details.
Change Log
The change log is located here.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file bdbag-1.5.6.tar.gz
.
File metadata
- Download URL: bdbag-1.5.6.tar.gz
- Upload date:
- Size: 1.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.4.2 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.8.0 tqdm/4.19.4 CPython/3.5.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ae057fec9640442ad039fe50b60afa1d71c0109f9163d182f4f4b27774bbc8b3 |
|
MD5 | 66a6878203aebc055cef6683322d2f73 |
|
BLAKE2b-256 | fcb00671bb0494a86c17a86035912b1062827173da5ed60aba792c7ee9a4d71a |
File details
Details for the file bdbag-1.5.6-py2.py3-none-any.whl
.
File metadata
- Download URL: bdbag-1.5.6-py2.py3-none-any.whl
- Upload date:
- Size: 66.3 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.4.2 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.8.0 tqdm/4.19.4 CPython/3.5.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | aaab889f97acc75c46cd9329e1eb524faac613cc3861eee1b74b6fb12a291c3d |
|
MD5 | 202499483c19fb3dd6f13abff11504b9 |
|
BLAKE2b-256 | 827017792d7bc71bb2d1378ef06f06a06d7332beae5301272fcc3ec1b1bbb3da |