a simple utility to check and harvest metadata records from an OAI request when they meet theDLTN requirements
Project description
About
Tests whether records from an OAI-PMH feed pass minimum requirements of DLTN and optionally harvests only the good records from a request to disk so that they can be added to Repox and included in the DPLA.
Install
Running with Builtin Argument Parsing from a CLI
If you want to do it this way, you’re going to need to clone this. It’s also suggested to build this with pipenv.
$ git clone https://github.com/DigitalLibraryofTennessee/check_and_harvest
$ cd check_and_harvest
$ pipenv install
$ pipenv shell
Using OAIChecker from the dltnchecker module
If you’re cool :sunglasses: :
$ pipenv install dltn_checker
Otherwise:
$ pip install dltn_checker
Examples with the Built In Argument Parser
Check for bad DC records in an entire OAI-PMH feed.
$ python run -e http://my-oai-endpoint:8080/OAIHandler -m oai_dc
Check and harvest good DC records from an entire OAI-PMH feed.
$ python run -e http://my-oai-endpoint:8080/OAIHandler -m oai_dc -H True
Check and harvest good xoai records from a specifc set.
$ python run -e http://my-oai-endpoint:8080/OAIHandler -m xoai -s my_awesome_xoai_set -H True
Check and harvest good MODS records from an entire provider in Repox.
$ python run -e http://my-oai-endpoint:8080/OAIHandler -m MODS -p CrossroadstoFreedomr0 -H True
Examples using the OAIChecker Class from dltnchecker
Check a set to see if there are any bad files in a set.
from dltnchecker.harvest import OAIChecker
request = OAIChecker("https://dpla.lib.utk.edu/repox/OAIHandler", "crossroads_sanitation", "MODS")
request.list_records()
print(request.bad_records)
By default, this will try to download the good files to a directory called output. If you don’t want to download, you need to pass an additional parameter called harvest and set to False.
from dltnchecker.harvest import OAIChecker
request = OAIChecker("https://dpla.lib.utk.edu/repox/OAIHandler", "crossroads_sanitation", "MODS", harvest=False)
request.list_records()
print(request.bad_records)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file dltn_checker-0.0.2.tar.gz
.
File metadata
- Download URL: dltn_checker-0.0.2.tar.gz
- Upload date:
- Size: 5.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.21.0 setuptools/40.6.3 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.7.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2151796d8aca7dd9ad5ad5f6f5de40d537e0652e62611c104ca6c90599e5dcea |
|
MD5 | 329b31e3271898ac134f28cf315c4722 |
|
BLAKE2b-256 | 7a7c252f4142acc4b4c21ca60f8cf30fe2a3c94759d4fea8e3a6149f5e9402b7 |
File details
Details for the file dltn_checker-0.0.2-py3-none-any.whl
.
File metadata
- Download URL: dltn_checker-0.0.2-py3-none-any.whl
- Upload date:
- Size: 9.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.21.0 setuptools/40.6.3 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.7.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bab11e305c1333dd4d665caf80c6dfae80fae8b15075325209c83f7c30ab9610 |
|
MD5 | b908383ec7683c3160c391d38802ad2f |
|
BLAKE2b-256 | 4c7c67ce41a8040f6f3de221c35b8763f6d196fc8f050f19405e58e79e9b1c3e |