Create and validate BagIt packages
Project description
bagit-python
|Build Status| |Coverage Status|
bagit is a Python library and command line utility for working with
BagIt <http://purl.org/net/bagit>
__ style packages.
Installation
bagit.py is a single-file python module that you can drop into your project as needed or you can install globally with:
::
pip install bagit
Python v2.7+ is required.
Command Line Usage
When you install bagit you should get a command-line program called bagit.py which you can use to turn an existing directory into a bag:
::
bagit.py --contact-name 'John Kunze' /directory/to/bag
Finding Bagit on your system
The ``bagit.py`` program should be available in your normal command-line
window (Terminal on OS X, Command Prompt or Powershell on Windows,
etc.). If you are unsure where it was installed you can also request
that Python search for ``bagit`` as a Python module: simply replace
``bagit.py`` with ``python -m bagit``:
::
python -m bagit --help
On some systems Python may have been installed as ``python3``, ``py``,
etc. – simply use the same name you use to start an interactive Python
shell:
::
py -m bagit --help
python3 -m bagit --help
Configuring BagIt
~~~~~~~~~~~~~~~~~
You can pass in key/value metadata for the bag using options like
``--contact-name`` above, which get persisted to the bag-info.txt. For a
complete list of bag-info.txt properties you can use as commmand line
arguments see ``--help``.
Since calculating checksums can take a while when creating a bag, you
may want to calculate them in parallel if you are on a multicore
machine. You can do that with the ``--processes`` option:
::
bagit.py --processes 4 /directory/to/bag
To specify which checksum algorithm(s) to use when generating the
manifest, use the --md5, --sha1, --sha256 and/or --sha512 flags (MD5 is
generated by default).
::
bagit.py --sha1 /path/to/bag
bagit.py --sha256 /path/to/bag
bagit.py --sha512 /path/to/bag
If you would like to validate a bag you can use the --validate flag.
::
bagit.py --validate /path/to/bag
If you would like to take a quick look at the bag to see if it seems
valid by just examining the structure of the bag, and comparing its
payload-oxum (byte count and number of files) then use the ``--fast``
flag.
::
bagit.py --validate --fast /path/to/bag
And finally, if you'd like to parallelize validation to take advantage
of multiple CPUs you can:
::
bagit.py --validate --processes 4 /path/to/bag
Using BagIt in your programs
----------------------------
You can also use BagIt programatically in your own Python programs by
importing the ``bagit`` module.
Create
~~~~~~
To create a bag you would do this:
.. code:: python
bag = bagit.make_bag('mydir', {'Contact-Name': 'John Kunze'})
``make_bag`` returns a Bag instance. If you have a bag already on disk
and would like to create a Bag instance for it, simply call the
constructor directly:
.. code:: python
bag = bagit.Bag('/path/to/bag')
Update Bag Metadata
~~~~~~~~~~~~~~~~~~~
You can change the metadata persisted to the bag-info.txt by using the
``info`` property on a ``Bag``.
.. code:: python
# load the bag
bag = bagit.Bag('/path/to/bag')
# update bag info metadata
bag.info['Internal-Sender-Description'] = 'Updated on 2014-06-28.'
bag.info['Authors'] = ['John Kunze', 'Andy Boyko']
bag.save()
Update Bag Manifests
~~~~~~~~~~~~~~~~~~~~
By default ``save`` will not update manifests. This guards against a
situation where a call to ``save`` to persist bag metadata accidentally
regenerates manifests for an invalid bag. If you have modified the
payload of a bag by adding, modifying or deleting files in the data
directory, and wish to regenerate the manifests set the ``manifests``
parameter to True when calling ``save``.
.. code:: python
import shutil, os
# add a file
shutil.copyfile('newfile', '/path/to/bag/data/newfile')
# remove a file
os.remove('/path/to/bag/data/file')
# persist changes
bag.save(manifests=True)
The save method takes an optional processes parameter which will
determine how many processes are used to regenerate the checksums. This
can be handy on multicore machines.
Validation
~~~~~~~~~~
If you would like to see if a bag is valid, use its ``is_valid`` method:
.. code:: python
bag = bagit.Bag('/path/to/bag')
if bag.is_valid():
print("yay :)")
else:
print("boo :(")
If you'd like to get a detailed list of validation errors, execute the
``validate`` method and catch the ``BagValidationError`` exception. If
the bag's manifest was invalid (and it wasn't caught by the payload
oxum) the exception's ``details`` property will contain a list of
``ManifestError``\ s that you can introspect on. Each ManifestError,
will be of type ``ChecksumMismatch``, ``FileMissing``,
``UnexpectedFile``.
So for example if you want to print out checksums that failed to
validate you can do this:
.. code:: python
bag = bagit.Bag("/path/to/bag")
try:
bag.validate()
except bagit.BagValidationError as e:
for d in e.details:
if isinstance(d, bagit.ChecksumMismatch):
print("expected %s to have %s checksum of %s but found %s" %
(d.path, d.algorithm, d.expected, d.found))
To iterate through a bag's manifest and retrieve checksums for the
payload files use the bag's entries dictionary:
.. code:: python
bag = bagit.Bag("/path/to/bag")
for path, fixity in bag.entries.items():
print("path:%s md5:%s" % (path, fixity["md5"]))
Contributing to bagit-python development
----------------------------------------
::
% git clone git://github.com/LibraryOfCongress/bagit-python.git
% cd bagit-python
# MAKE CHANGES
% python test.py
Running the tests
~~~~~~~~~~~~~~~~~
You can quickly run the tests by having setuptools install dependencies:
::
python setup.py test
Once your code is working, you can use
`Tox <https://tox.readthedocs.io/>`__ to run the tests with every
supported version of Python which you have installed on the local
system:
::
tox
If you have Docker installed, you can run the tests under Linux inside a
container:
::
% docker build -t bagit:latest . && docker run -it bagit:latest
Benchmarks
----------
If you'd like to see how increasing parallelization of bag creation on
your system effects the time to create a bag try using the included
bench utility:
::
% ./bench.py
License
-------
|cc0|
Note: By contributing to this project, you agree to license your work
under the same terms as those that govern this project's distribution.
.. |Coverage Status| image:: https://coveralls.io/repos/github/LibraryOfCongress/bagit-python/badge.svg?branch=master
:target: https://coveralls.io/github/LibraryOfCongress/bagit-python?branch=master
.. |cc0| image:: http://i.creativecommons.org/p/zero/1.0/88x31.png
:target: http://creativecommons.org/publicdomain/zero/1.0/
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file ocrd-fork-bagit-1.8.1.post2.tar.gz
.
File metadata
- Download URL: ocrd-fork-bagit-1.8.1.post2.tar.gz
- Upload date:
- Size: 30.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.7.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bf51c7f488b85d6af72ae1d54cd1625d64283f76e4cb84ec5081832d1ae25973 |
|
MD5 | 9154c4434ad332341f8f8a054039bfea |
|
BLAKE2b-256 | e82c1bc15418441590a10803d23d562338f91c79439dccd8d82a55dd8ab875ab |
File details
Details for the file ocrd_fork_bagit-1.8.1.post2-py2.py3-none-any.whl
.
File metadata
- Download URL: ocrd_fork_bagit-1.8.1.post2-py2.py3-none-any.whl
- Upload date:
- Size: 35.5 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.7.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bdd500c5bc40600c5d0504c11b888ff488ebf3619c8f2a3e28b683535eb02191 |
|
MD5 | 8377d975089c22f00ece2a2534c0c450 |
|
BLAKE2b-256 | db686a41759ed3d8135bb6aad00a83b99bd98f27d2d8a4e08a0280d8b726c5b8 |