Skip to main content

a simple utility to check and harvest metadata records from an OAI request when they meet theDLTN requirements

Project description

TravisCI badge PyPI badge

About

Tests whether records from an OAI-PMH feed pass minimum requirements of DLTN and optionally harvests only the good records from a request to disk so that they can be added to Repox and included in the DPLA.

Install

Running with Builtin Argument Parsing from a CLI

If you want to do it this way, you’re going to need to clone this. It’s also suggested to build this with pipenv.

$ git clone https://github.com/DigitalLibraryofTennessee/check_and_harvest
$ cd check_and_harvest
$ pipenv install
$ pipenv shell

Using OAIChecker from the dltnchecker module

If you’re cool :sunglasses: :

$ pipenv install dltn_checker

Otherwise:

$ pip install dltn_checker

Examples with the Built In Argument Parser

  1. Check for bad DC records in an entire OAI-PMH feed.
$ python run -e http://my-oai-endpoint:8080/OAIHandler -m oai_dc
  1. Check and harvest good DC records from an entire OAI-PMH feed.
$ python run -e http://my-oai-endpoint:8080/OAIHandler -m oai_dc -H True
  1. Check and harvest good xoai records from a specifc set.
$ python run -e http://my-oai-endpoint:8080/OAIHandler -m xoai -s my_awesome_xoai_set -H True
  1. Check and harvest good MODS records from an entire provider in Repox.
$ python run -e http://my-oai-endpoint:8080/OAIHandler -m MODS -p CrossroadstoFreedomr0 -H True

Examples using the OAIChecker Class from dltnchecker

Check a set to see if there are any bad files in a set.

from dltnchecker.harvest import OAIChecker
request = OAIChecker("https://dpla.lib.utk.edu/repox/OAIHandler", "crossroads_sanitation", "MODS")
request.list_records()
print(request.bad_records)

By default, this will try to download the good files to a directory called output. If you don’t want to download, you need to pass an additional parameter called harvest and set to False.

from dltnchecker.harvest import OAIChecker
request = OAIChecker("https://dpla.lib.utk.edu/repox/OAIHandler", "crossroads_sanitation", "MODS", harvest=False)
request.list_records()
print(request.bad_records)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dltn_checker-0.0.2.tar.gz (5.4 kB view hashes)

Uploaded source

Built Distribution

dltn_checker-0.0.2-py3-none-any.whl (9.1 kB view hashes)

Uploaded py3

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page