python library and command line interface for creating and interacting with files in the BagIt-format
Project description
BagItUtils
This repository contains a python library along with a command line interface for creating, interacting with, and validating files in the BagIt-format (v1.0). It implements most but not all of the specification (see planned additions). The package consists of two major modules:
bagit: basic support for the BagIt-spec including parsing (meta-)data and validating structure as well as checksumsvalidator: an implementation of the BagIt Profiles-project's (1.4) specification for extended Bag validation; takes a modular approach for easy customization (see details)
Please refer to the examples-section for a brief overwiew of the features.
Key features of this repository are
- a modern, extendable, and easy to use API,
- a high test-coverage, and
- a command line interface.
Install
Install this package by entering
pip install bagit-utils
It is generally recommended to install in a virtual environment, create and activate said environment by entering for example
python3 -m venv venv
source venv/bin/activate
Basic usage examples
CLI
This package provides a command line interface (via the befehl-library) if installed with the extra-dependency "cli":
pip install bagit-utils[cli]
After installing, the CLI can be invoked with bagit.
The CLI provides options for the creation, inspection, modification, and validation of Bags.
You can also activate autocomplete for (the current session of) bash-terminals with
eval "$(_BEFEHL_COMPLETION= bagit --generate-autocomplete)"
If you want to set up persistent autocomplete, instead generate the source file via
_BEFEHL_COMPLETION= bagit --generate-autocomplete
and place the contents of that script in your ~/.bash_autocomplete-file.
BagIt
Initialize an existing Bag with
from pathlib import Path
from bagit_utils import Bag
bag = Bag(Path("path/to/bag"))
Access bag-metadata via properties
print(bag.baginfo)
print(bag.manifests)
print(bag.tag_manifests)
Reload data after initialization
bag = Bag(Path("path/to/bag"))
# .. some operation that changes bag-info.txt
bag.load_baginfo()
Update manifests (on disk) after changes to the bag-payload/tag-files occurred
bag = Bag(Path("path/to/bag"))
# .. some operation that, e.g., adds/removes/changes files in data/ or meta/
bag.set_manifests()
bag.set_tag_manifests()
Update bag-info after initialization
bag = Bag(Path("path/to/bag"))
bag.set_baginfo(
bag.baginfo | {"AdditionalField": ["value0", "value1"]}
)
Create bag from source
bag = Bag.build_from(
Path("path/to/source"), # should contain payload in data/-directory
Path("path/to/bag"), # should be empty
baginfo={
"Source-Organization": ["My Organization"].
...,
"Payload-Oxum": [Bag.get_payload_oxum(Path("path/to/source"))],
"Bagging-Date": [Bag.get_bagging_date()],
},
algorithms=["md5", "sha1"],
)
If a Bag is created with the load-flag or created via Bag.build_from with the validate-flag, the contents of the directory/created Bag are validated regarding the BagIt-specification (1.0) (Bag format or Bag format and file checksums, respectively).
This validation can also be triggered manually by entering
report = bag.validate()
The report that is returned contains an overall flag for validity and a list of detected issues. For more advanced validation, see also the following section on Profile-Validation.
Customization
This section shows a simple example on how to extend the Bag class with custom loading- and validation-features.
Suppose Bags are expected to always contain a specific tag-file bag.json and the contents of this file should be available after instantiating a Bag.
To achieve this behavior, both the loading and validation can be hooked via the methods
custom_load_hook,custom_validate_format_hook, andcustom_validate_hook.
The updated loading for a corresponding CustomBag-class could then be defined as follows:
from json import loads
from bagit_utils import Bag
class CustomBag(Bag):
def custom_load_hook(self):
self.bag_json = loads((self.path / "bag.json").read_bytes())
Similarly, the required validation can be implemented as:
from bagit_utils.common import ValidationReport, Issue
class CustomBag(Bag):
def custom_load_hook(self):
...
def custom_validate_format_hook(self):
report = ValidationReport(True, bag=self)
if not (self.path / "bag.json").is_file():
report.valid = False
report.issues.append(
Issue(
"error",
f"Missing file 'bag.json' in Bag at '{self.path}'.",
"bag.json",
)
)
# additional validation steps
# ...
return report
BagIt-profile validation
The bagit_utils.validator-module consists of two classes that can be used for advanced Bag-validation and is based on the BagIt Profiles-project (1.4).
Their implementation takes a modular approach to simplify customization.
BagItProfileValidator: A customizable validator for BagIt-profiles themselves
In order to load and validate a JSON-profile, run for example
from bagit_utils import BagItProfileValidator
profile = BagItProfileValidator.load_profile(profile_src="https://raw.githubusercontent.com/bagit-profiles/bagit-profiles/master/bagProfileFoo.json")
A ValueError will be raised if a problem is detected during validation of that profile.
BagValidator: A customizable validator for Bags based on BagIt-profiles
Using a BagIt-JSON-profile, the class BagValidator can be used to validate a Bag's contents (structure and metadata) in great detail.
To run validation on a Bag-instance using the previously loaded profile, simply enter
from bagit_utils import BagValidator
report = BagValidator.validate_once(bag, profile=profile)
Just like the basic validation of the Bag, the response is a ValidationReport detailing validity and issues.
The validator can also be initialized only once and then be reused by instead writing
validator = BagValidator(profile=profile)
report1 = validator.validate(bag1)
report2 = validator.validate(bag2)
# ...
Validator customization
This section shows a simple example on how to extend the BagItProfileValidator and BagValidator classes.
Consider an extended BagIt-profile specification should be supported.
For simplicity, the following will use the simple example of a boolean profile-tag My-Tag, which is required in the profile and, if set to true, requires the Bag to include a tag-file my-tag.txt.
The updated BagItProfileValidator could then be defined as follows:
class MyBagItProfileValidator(BagItProfileValidator):
_ACCEPTED_PROPERTIES = BagItProfileValidator._ACCEPTED_PROPERTIES + ["My-Tag"]
@classmethod
def custom_validation_hook(cls, profile):
if "My-Tag" not in profile:
raise ValueError(cls._ERROR_PREFIX + "Missing required tag 'My-Tag'.")
cls._handle_type_validation(bool, "My-Tag", profile["My-Tag"])
Similarly, the BagValidator also has a hook that can be used to implement the Bag-validation itself:
from bagit_utils.common import Issue, ValidationReport
class MyBagValidator(BagValidator):
_PROFILE_VALIDATOR = MyBagItProfileValidator
@classmethod
def custom_validation_hook(cls, bag, profile):
result = ValidationReport(True)
if profile["My-Tag"] and not (bag.path / "my-tag.txt").is_file():
result.valid = False
result.issues.append(
Issue("error", "Bag must contain tag-file 'my-tag.txt'.", "My-Tag")
)
return result
With these definitions, a validation using the changed specification can be run via
report = MyBagValidator.validate_once(bag, profile={"My-Tag": True})
The hooks available for these kinds of extensions are
BagItProfileValidator._validate_baginfo_custom_item_hookBagItProfileValidator.custom_validation_hookBagValidator._validate_baginfo_custom_tags_hookBagValidator.custom_validation_hookPlease refer to the source code for even more details on/documentation of arguments and expected behavior/return values.
Extensions of/Deviations from specification
In some minor aspects, these validators deviate from the BagIt-profiles specification:
- items in the
"Bag-Info"-section of the profile support the additional field"regex"which enables optional regex-matching (using the fullmatch-strategy) - the
"BagIt-Profile-Info"-section is not validated - the tag
"BagIt-Profile-Identifier"-tag inbag-info.txtis not validated - the
"Accept-BagIt-Version"-section can be omitted which is then interpreted as it having the value["1.0"]
Planned additions
- support for
fetch.txt(currently validation only) - support for Bag-serialization
Tests
The project has a high test-coverage. To run the tests locally, first install the dependencies
pip install .
pip install -r dev-requirements.txt
and afterwards run pytest with
pytest -v -s tests
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bagit_utils-1.2.2.tar.gz.
File metadata
- Download URL: bagit_utils-1.2.2.tar.gz
- Upload date:
- Size: 28.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fabc27fde8f0ef4d45cf34286b0eb8f5a60c2b0375b0e54965c53a5d7fd74e6e
|
|
| MD5 |
98a3573dcd5b855f1f17960c49d58d3a
|
|
| BLAKE2b-256 |
ad0fb66892f643ebec6d0536f55f730b200b3e72b7c8949a2da515007047c3bf
|
File details
Details for the file bagit_utils-1.2.2-py3-none-any.whl.
File metadata
- Download URL: bagit_utils-1.2.2-py3-none-any.whl
- Upload date:
- Size: 22.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bef859aee2900d53ea44f767c30f6111fc6dbd5d501f2ea2461474024c788bca
|
|
| MD5 |
81a4e3073cb2a9d6caad549e3db13206
|
|
| BLAKE2b-256 |
2213a6519d39ad087076e2c895028c2621a011361df4c0b4256af4ecec6071af
|