a simple utility to check and harvest metadata records from an OAI request when they meet theDLTN requirements
Project description
About
Tests whether records from an OAI-PMH feed pass minimum requirements of DLTN and optionally harvests only the good records from a request to disk so that they can be added to Repox and included in the DPLA.
Install
Running with Builtin Argument Parsing from a CLI
If you want to do it this way, you’re going to need to clone this. It’s also suggested to build this with pipenv.
$ git clone https://github.com/DigitalLibraryofTennessee/check_and_harvest
$ cd check_and_harvest
$ pipenv install
$ pipenv shell
Using OAIChecker from the dltnchecker module
If you’re cool :sunglasses: :
$ pipenv install dltn_checker
Otherwise:
$ pip install dltn_checker
Examples with the Built In Argument Parser
Check for bad DC records in an entire OAI-PMH feed.
$ python run -e http://my-oai-endpoint:8080/OAIHandler -m oai_dc
Check and harvest good DC records from an entire OAI-PMH feed.
$ python run -e http://my-oai-endpoint:8080/OAIHandler -m oai_dc -H True
Check and harvest good xoai records from a specifc set.
$ python run -e http://my-oai-endpoint:8080/OAIHandler -m xoai -s my_awesome_xoai_set -H True
Check and harvest good MODS records from an entire provider in Repox.
$ python run -e http://my-oai-endpoint:8080/OAIHandler -m MODS -p CrossroadstoFreedomr0 -H True
Examples using the OAIChecker Class from dltnchecker
Check a set to see if there are any bad files in a set.
from dltnchecker.harvest import OAIChecker
request = OAIChecker("https://dpla.lib.utk.edu/repox/OAIHandler", "crossroads_sanitation", "MODS")
request.list_records()
print(request.bad_records)
By default, this will try to download the good files to a directory called output. If you don’t want to download, you need to pass an additional parameter called harvest and set to False.
from dltnchecker.harvest import OAIChecker
request = OAIChecker("https://dpla.lib.utk.edu/repox/OAIHandler", "crossroads_sanitation", "MODS", harvest=False)
request.list_records()
print(request.bad_records)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for dltn_checker-0.0.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bab11e305c1333dd4d665caf80c6dfae80fae8b15075325209c83f7c30ab9610 |
|
MD5 | b908383ec7683c3160c391d38802ad2f |
|
BLAKE2b-256 | 4c7c67ce41a8040f6f3de221c35b8763f6d196fc8f050f19405e58e79e9b1c3e |