Skip to main content

Create and validate BagIt packages

Project description

bagit-python

|Build Status| |Coverage Status|

bagit is a Python library and command line utility for working with BagIt <http://purl.org/net/bagit>__ style packages.

Installation

bagit.py is a single-file python module that you can drop into your project as needed or you can install globally with:

::

pip install bagit

Python v2.7+ is required.

Command Line Usage

When you install bagit you should get a command-line program called bagit.py which you can use to turn an existing directory into a bag:

::

bagit.py --contact-name 'John Kunze' /directory/to/bag

Finding Bagit on your system


The ``bagit.py`` program should be available in your normal command-line
window (Terminal on OS X, Command Prompt or Powershell on Windows,
etc.). If you are unsure where it was installed you can also request
that Python search for ``bagit`` as a Python module: simply replace
``bagit.py`` with ``python -m bagit``:

::

    python -m bagit --help

On some systems Python may have been installed as ``python3``, ``py``,
etc. – simply use the same name you use to start an interactive Python
shell:

::

    py -m bagit --help
    python3 -m bagit --help

Configuring BagIt
~~~~~~~~~~~~~~~~~

You can pass in key/value metadata for the bag using options like
``--contact-name`` above, which get persisted to the bag-info.txt. For a
complete list of bag-info.txt properties you can use as commmand line
arguments see ``--help``.

Since calculating checksums can take a while when creating a bag, you
may want to calculate them in parallel if you are on a multicore
machine. You can do that with the ``--processes`` option:

::

    bagit.py --processes 4 /directory/to/bag

To specify which checksum algorithm(s) to use when generating the
manifest, use the --md5, --sha1, --sha256 and/or --sha512 flags (MD5 is
generated by default).

::

    bagit.py --sha1 /path/to/bag
    bagit.py --sha256 /path/to/bag
    bagit.py --sha512 /path/to/bag

If you would like to validate a bag you can use the --validate flag.

::

    bagit.py --validate /path/to/bag

If you would like to take a quick look at the bag to see if it seems
valid by just examining the structure of the bag, and comparing its
payload-oxum (byte count and number of files) then use the ``--fast``
flag.

::

    bagit.py --validate --fast /path/to/bag

And finally, if you'd like to parallelize validation to take advantage
of multiple CPUs you can:

::

    bagit.py --validate --processes 4 /path/to/bag

Using BagIt in your programs
----------------------------

You can also use BagIt programatically in your own Python programs by
importing the ``bagit`` module.

Create
~~~~~~

To create a bag you would do this:

.. code:: python

    bag = bagit.make_bag('mydir', {'Contact-Name': 'John Kunze'})

``make_bag`` returns a Bag instance. If you have a bag already on disk
and would like to create a Bag instance for it, simply call the
constructor directly:

.. code:: python

    bag = bagit.Bag('/path/to/bag')

Update Bag Metadata
~~~~~~~~~~~~~~~~~~~

You can change the metadata persisted to the bag-info.txt by using the
``info`` property on a ``Bag``.

.. code:: python

    # load the bag
    bag = bagit.Bag('/path/to/bag')

    # update bag info metadata
    bag.info['Internal-Sender-Description'] = 'Updated on 2014-06-28.'
    bag.info['Authors'] = ['John Kunze', 'Andy Boyko']
    bag.save()

Update Bag Manifests
~~~~~~~~~~~~~~~~~~~~

By default ``save`` will not update manifests. This guards against a
situation where a call to ``save`` to persist bag metadata accidentally
regenerates manifests for an invalid bag. If you have modified the
payload of a bag by adding, modifying or deleting files in the data
directory, and wish to regenerate the manifests set the ``manifests``
parameter to True when calling ``save``.

.. code:: python


    import shutil, os

    # add a file
    shutil.copyfile('newfile', '/path/to/bag/data/newfile')

    # remove a file
    os.remove('/path/to/bag/data/file')

    # persist changes
    bag.save(manifests=True)

The save method takes an optional processes parameter which will
determine how many processes are used to regenerate the checksums. This
can be handy on multicore machines.

Validation
~~~~~~~~~~

If you would like to see if a bag is valid, use its ``is_valid`` method:

.. code:: python

    bag = bagit.Bag('/path/to/bag')
    if bag.is_valid():
        print("yay :)")
    else:
        print("boo :(")

If you'd like to get a detailed list of validation errors, execute the
``validate`` method and catch the ``BagValidationError`` exception. If
the bag's manifest was invalid (and it wasn't caught by the payload
oxum) the exception's ``details`` property will contain a list of
``ManifestError``\ s that you can introspect on. Each ManifestError,
will be of type ``ChecksumMismatch``, ``FileMissing``,
``UnexpectedFile``.

So for example if you want to print out checksums that failed to
validate you can do this:

.. code:: python


    bag = bagit.Bag("/path/to/bag")

    try:
      bag.validate()

    except bagit.BagValidationError as e:
        for d in e.details:
            if isinstance(d, bagit.ChecksumMismatch):
                print("expected %s to have %s checksum of %s but found %s" %
                      (d.path, d.algorithm, d.expected, d.found))

To iterate through a bag's manifest and retrieve checksums for the
payload files use the bag's entries dictionary:

.. code:: python

    bag = bagit.Bag("/path/to/bag")

    for path, fixity in bag.entries.items():
      print("path:%s md5:%s" % (path, fixity["md5"]))

Contributing to bagit-python development
----------------------------------------

::

    % git clone git://github.com/LibraryOfCongress/bagit-python.git
    % cd bagit-python
    # MAKE CHANGES
    % python test.py

Running the tests
~~~~~~~~~~~~~~~~~

You can quickly run the tests by having setuptools install dependencies:

::

    python setup.py test

Once your code is working, you can use
`Tox <https://tox.readthedocs.io/>`__ to run the tests with every
supported version of Python which you have installed on the local
system:

::

    tox

If you have Docker installed, you can run the tests under Linux inside a
container:

::

    % docker build -t bagit:latest . && docker run -it bagit:latest

Benchmarks
----------

If you'd like to see how increasing parallelization of bag creation on
your system effects the time to create a bag try using the included
bench utility:

::

    % ./bench.py

License
-------

|cc0|

Note: By contributing to this project, you agree to license your work
under the same terms as those that govern this project's distribution.

.. |Coverage Status| image:: https://coveralls.io/repos/github/LibraryOfCongress/bagit-python/badge.svg?branch=master
   :target: https://coveralls.io/github/LibraryOfCongress/bagit-python?branch=master
.. |cc0| image:: http://i.creativecommons.org/p/zero/1.0/88x31.png
   :target: http://creativecommons.org/publicdomain/zero/1.0/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ocrd-fork-bagit-1.8.1.post2.tar.gz (30.0 kB view details)

Uploaded Source

Built Distribution

ocrd_fork_bagit-1.8.1.post2-py2.py3-none-any.whl (35.5 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file ocrd-fork-bagit-1.8.1.post2.tar.gz.

File metadata

  • Download URL: ocrd-fork-bagit-1.8.1.post2.tar.gz
  • Upload date:
  • Size: 30.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.7.12

File hashes

Hashes for ocrd-fork-bagit-1.8.1.post2.tar.gz
Algorithm Hash digest
SHA256 bf51c7f488b85d6af72ae1d54cd1625d64283f76e4cb84ec5081832d1ae25973
MD5 9154c4434ad332341f8f8a054039bfea
BLAKE2b-256 e82c1bc15418441590a10803d23d562338f91c79439dccd8d82a55dd8ab875ab

See more details on using hashes here.

File details

Details for the file ocrd_fork_bagit-1.8.1.post2-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for ocrd_fork_bagit-1.8.1.post2-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 bdd500c5bc40600c5d0504c11b888ff488ebf3619c8f2a3e28b683535eb02191
MD5 8377d975089c22f00ece2a2534c0c450
BLAKE2b-256 db686a41759ed3d8135bb6aad00a83b99bd98f27d2d8a4e08a0280d8b726c5b8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page