Skip to main content

simple python interface for creating and interacting with files in the BagIt-format

Project description

Tests PyPI - License GitHub top language PyPI - Python Version PyPI version PyPI - Wheel

BagItUtils

This repository contains a simple python interface for creating, interacting with, and validating files in the BagIt-format (v1.0). It implements most but not all of the specification (see planned additions). The package consists of two major modules:

  • bagit: basic support for the BagIt-spec including parsing (meta-)data and validating structure as well as checksums
  • validator: an implementation of the BagIt Profiles-project's (1.4) specification for extended Bag validation; takes a modular approach for easy customization (see details)

Please refer to the examples-section for a brief overwiew of the features.

Basic usage examples

BagIt

Initialize an existing Bag with

from pathlib import Path
from bagit_utils import Bag

bag = Bag(Path("path/to/bag"))

Access bag-metadata via properties

print(bag.baginfo)
print(bag.manifests)
print(bag.tag_manifests)

Reload data after initialization

bag = Bag(Path("path/to/bag"))

# .. some operation that changes bag-info.txt

bag.load_baginfo()

Update manifests (on disk) after changes to the bag-payload/tag-files occurred

bag = Bag(Path("path/to/bag"))

# .. some operation that, e.g., adds/removes/changes files in data/ or meta/

bag.set_manifests()
bag.set_tag_manifests()

Update bag-info after initialization

bag = Bag(Path("path/to/bag"))

bag.set_baginfo(
    bag.baginfo | {"AdditionalField": ["value0", "value1"]}
)

Create bag from source

bag = Bag.build_from(
    Path("path/to/source"),  # should contain payload in data/-directory
    Path("path/to/bag"),  # should be empty
    baginfo={
        "Source-Organization": ["My Organization"].
        ...,
        "Payload-Oxum": [Bag.get_payload_oxum(Path("path/to/source"))],
        "Bagging-Date": [Bag.get_bagging_date()],
    },
    algorithms=["md5", "sha1"],
)

If a Bag is created with the load-flag or created via Bag.build_from with the validate-flag, the contents of the directory/created Bag are validated regarding the BagIt-specification (1.0) (Bag format or Bag format and file checksums, respectively). This validation can also be triggered manually by entering

report = bag.validate()

The report that is returned contains an overall flag for validity and a list of detected issues. For more advanced validation, see also the following section on Profile-Validation.

BagIt-profile validation

The bagit_utils.validator-module consists of two classes that can be used for advanced Bag-validation and is based on the BagIt Profiles-project (1.4). Their implementation takes a modular approach to simplify customization.

BagItProfileValidator: A customizable validator for BagIt-profiles themselves

In order to load and validate a JSON-profile, run for example

from bagit_utils import BagItProfileValidator

profile = BagItProfileValidator.load_profile(profile_src="https://raw.githubusercontent.com/bagit-profiles/bagit-profiles/master/bagProfileFoo.json")

A ValueError will be raised if a problem is detected during validation of that profile.

BagValidator: A customizable validator for Bags based on BagIt-profiles

Using a BagIt-JSON-profile, the class BagValidator can be used to validate a Bag's contents (structure and metadata) in great detail. To run validation on a Bag-instance using the previously loaded profile, simply enter

from bagit_utils import BagValidator

report = BagValidator.validate_once(bag, profile=profile)

Just like the basic validation of the Bag, the response is a ValidationReport detailing validity and issues.

The validator can also be initialized only once and then be reused by instead writing

validator = BagValidator(profile=profile)
report1 = validator.validate(bag1)
report2 = validator.validate(bag2)
# ...

Validator customization

This section shows a simple example on how to extend the BagItProfileValidator and BagValidator classes.

Consider an extended BagIt-profile specification should be supported. For simplicity, the following will use the simple example of a boolean profile-tag My-Tag, which is required in the profile and, if set to true, requires the Bag to include a tag-file my-tag.txt.

The updated BagItProfileValidator could then be defined as follows:

class MyBagItProfileValidator(BagItProfileValidator):
    _ACCEPTED_PROPERTIES = BagItProfileValidator._ACCEPTED_PROPERTIES + ["My-Tag"]
    @classmethod
    def custom_validation_hook(cls, profile):
        if "My-Tag" not in profile:
            raise ValueError(cls._ERROR_PREFIX + "Missing required tag 'My-Tag'.")
        cls._handle_type_validation(bool, "My-Tag", profile["My-Tag"])

Similarly, the BagValidator also has a hook that can be used to implement the Bag-validation itself:

from bagit_utils.common import Issue, ValidationReport
class MyBagValidator(BagValidator):
    _PROFILE_VALIDATOR = MyBagItProfileValidator
    @classmethod
    def custom_validation_hook(cls, bag, profile):
        result = ValidationReport(True)
        if profile["My-Tag"] and not (bag.path / "my-tag.txt").is_file():
            result.valid = False
            result.issues.append(
                Issue("error", "Bag must contain tag-file 'my-tag.txt'.", "My-Tag")
            )
        return result

With these definitions, a validation using the changed specification can be run via

report = MyBagValidator.validate_once(bag, profile={"My-Tag": True})

The hooks available for these kinds of extensions are

  • BagItProfileValidator._validate_baginfo_custom_item_hook
  • BagItProfileValidator.custom_validation_hook
  • BagValidator._validate_baginfo_custom_tags_hook
  • BagValidator.custom_validation_hook Please refer to the source code for even more details on/documentation of arguments and expected behavior/return values.

Extensions of/Deviations from specification

In some minor aspects, these validators deviate from the BagIt-profiles specification:

  • items in the "Bag-Info"-section of the profile support the additional field "regex" which enables optional regex-matching (using the fullmatch-strategy)
  • the "BagIt-Profile-Info"-section is not validated
  • the tag "BagIt-Profile-Identifier"-tag in bag-info.txt is not validated
  • the "Accept-BagIt-Version"-section can be omitted which is then interpreted as it having the value ["1.0"]

Planned additions

  • support for fetch.txt (currently validation only)
  • support for Bag-serialization

Tests

The project has a high test-coverage. To run the tests locally, first install the dependencies

pip install .
pip install pytest

and afterwards run pytest with

pytest -v -s tests

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bagit_utils-1.0.0.tar.gz (22.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bagit_utils-1.0.0-py3-none-any.whl (17.4 kB view details)

Uploaded Python 3

File details

Details for the file bagit_utils-1.0.0.tar.gz.

File metadata

  • Download URL: bagit_utils-1.0.0.tar.gz
  • Upload date:
  • Size: 22.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for bagit_utils-1.0.0.tar.gz
Algorithm Hash digest
SHA256 b0cf04ebbf0a10f731529858e70304a9addaf62a9f2c9dacf5f5ea365a5fa644
MD5 bf7ef3ceaf6f98efdc831112992d30e0
BLAKE2b-256 baaf1468280f376732658102f6d8bf5a9ce78c2bc66b97b0ea7fd2f0ee8d42cf

See more details on using hashes here.

File details

Details for the file bagit_utils-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: bagit_utils-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 17.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for bagit_utils-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2b90ff69b807674369e6fc76b55153d0a1f374ceb5bbe57dcc41e7f375c61972
MD5 4e53c0af764ec89a0e2f47ddb2ef0fcb
BLAKE2b-256 358fa006701afac0c8db98af4fd356f2febd00152f65acb1bc2969903d918acc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page