Skip to main content

Highly configurable oai-harvester based on sickle.

Project description

oaipmharvest

Description

oaipmharvest is a harvester for OAI-PMH written in python and based on sickle (for now). It's special focus lies on support for advanced non-standard use cases. If you just need a the standard feature set, you might be better off with something more mature and better tested.

The oaipmharvester will connect to a given OAI-endpoint and store its responses in a given output folder. It enables you to make incremental requests from the given OAI-endpoint or even restrict the result set by a given date. In addition to that, it provides several features to dynamically construct set specifiers from smaller parts.

This is an alpha release. Use with caution.

Features

  • Configuration via TOML
  • Advanced configuration support for dynamic sets (for e.g. those supported by BASE)

Installation

After cloning the git repository locally, set up a virtual environment and run

pip install oaipmharvest

Running

For running the application, you can call after the installation the CLI command oaipm_harvest, which also provides a help function by calling oaipm_harvester -h.

usage: oaipm_harvest [-h] [--from FROM] [--until UNTIL] file

positional arguments:
  file                  Config file (TOML)

optional arguments:
  -h, --help            show this help message and exit
  --from FROM, -f FROM  Harvest only items that where published after the specified date
  --until UNTIL, -u UNTIL
                        Harvest only items that where published before the specified date

To harvest a specific OAI, you have to provide a conf-file. An example conf-file for the most basic use case could be conf/my-journal.conf and would contain, for example:

endpoint_url = "https://www.contributions-to-entomology.org/oai/"
metadata_prefixes = ["marcxml"]
out_dir = "./out_cte"
use_sets = false

where

endpoint_url is the OAI-base-URL you want to connect to.

metadata_prefixes is a list of formats you want to download. The format is simply handed to the OAI-interface and, hence, it depends on the OAI-interface, if it supports the given format or not.

out_dir is the directory, where all the downloaded data will be stored. If the given folder(s) do not exists, they will be created.

use_sets false

Licence

All parts of this code are copyrighted by the University Library JCS, Frankfurt a. M. The project is made available under the Mozilla Public License 2.0.

Acknowledgement

This is a project created and maintained by the Specialised Information Service for Linguistics at the University Library J. C. Senckenberg and funded by the German Research Foundation (DFG; project identifier 326024153).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

oaipmharvest-0.0.4.tar.gz (15.7 kB view details)

Uploaded Source

Built Distribution

oaipmharvest-0.0.4-py3-none-any.whl (16.6 kB view details)

Uploaded Python 3

File details

Details for the file oaipmharvest-0.0.4.tar.gz.

File metadata

  • Download URL: oaipmharvest-0.0.4.tar.gz
  • Upload date:
  • Size: 15.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.5

File hashes

Hashes for oaipmharvest-0.0.4.tar.gz
Algorithm Hash digest
SHA256 ec7d2d2d88c530ce3fb48f1554d0fe6a1810f39dc08ac588b6e1a187cd37d738
MD5 27417e54281f70a3121003e591d7a5e4
BLAKE2b-256 cc7a4310eb317e81f00d835a104d64948f5aa74afb2e8e27fc8594e7e19c5a2d

See more details on using hashes here.

File details

Details for the file oaipmharvest-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: oaipmharvest-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 16.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.5

File hashes

Hashes for oaipmharvest-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 72a593381ebd373e1dbaab7adf571402db7187ca2f898b8df6b0b119e9ad4fd1
MD5 84e2dbd8f2780a54be35e6f1330b82d3
BLAKE2b-256 dcb883b9a21d5ab6f93f51514e00ae2434c89895eb328b24a4433a2d339fb6d0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page