Skip to main content

Highly configurable oai-harvester based on sickle.

Project description

oaipmharvest

Description

oaipmharvest is a harvester for OAI-PMH written in python and based on sickle (for now). It's special focus lies on support for advanced non-standard use cases and supporting endpoints that behave slightly out of the ordinary. If you just need the standard feature set, you might be better off with something more mature and better tested.

oaipmharvest will connect to a given OAI endpoint and, by default, store its responses in an output folder. It enables you to make incremental requests from the given OAI-endpoint or restrict the result set by a given date. In addition to that, it provides several features to dynamically construct set specifiers from smaller parts.

This is an alpha release. Use with caution.

Features

  • Configuration via TOML
  • Advanced configuration support for dynamic sets (for e.g. those supported by BASE)

Installation

If you want to use oaipmharvest as a standalone application, installation via pipx is recommended.

pipx install oaipmharvest

Installation via other package managers is of course possible, too. This is esp. recommended, if oaipmharvest should be used as a library.

pip install oaipmharvest

Running

In order to run the application after installation, you can call the CLI command oaipm_harvest, which also provides a help function by calling oaipm_harvest -h.

usage: oaipm_harvest [-h] [--from FROM] [--until UNTIL] file

positional arguments:
  file                  Config file (TOML)

optional arguments:
  -h, --help            show this help message and exit
  --from FROM, -f FROM  Harvest only items that where published after the specified date
  --until UNTIL, -u UNTIL
                        Harvest only items that where published before the specified date

To harvest a specific OAI-PMH endpoint, you have to provide a TOML config file. An example config file for the most basic use case could be conf/my-journal.conf and would contain, for example:

endpoint_url = "https://www.contributions-to-entomology.org/oai/"
metadata_prefixes = ["marcxml"]
out_dir = "./out_cte"
use_sets = false

where

endpoint_url is the OAI-base-URL you want to connect to.

metadata_prefixes is a list of formats you want to download. The format is simply handed to the OAI-interface and, hence, it depends on the OAI-interface, if it supports the given format or not.

out_dir is the directory, where all the downloaded data will be stored. If the given folder(s) do not exists, they will be created.

use_sets false

Licence

All parts of this code are copyrighted by the University Library JCS, Frankfurt a. M. The project is made available under the Mozilla Public License 2.0.

Acknowledgement

This is a project originially created by the Specialised Information Service for Linguistics at the University Library J. C. Senckenberg and funded by the German Research Foundation (DFG; project identifier 326024153).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

oaipmharvest-0.0.7.tar.gz (17.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

oaipmharvest-0.0.7-py3-none-any.whl (18.0 kB view details)

Uploaded Python 3

File details

Details for the file oaipmharvest-0.0.7.tar.gz.

File metadata

  • Download URL: oaipmharvest-0.0.7.tar.gz
  • Upload date:
  • Size: 17.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for oaipmharvest-0.0.7.tar.gz
Algorithm Hash digest
SHA256 48b8851dca71002e8547a7b4f9cc07f78fd019b8a81b45de8da821529a57fd5a
MD5 7264db3e36c55e643af1ffb22782f534
BLAKE2b-256 feda5f0ad8ef53f495633653876ea71836039e0e0193af6a7714c539cea7c438

See more details on using hashes here.

File details

Details for the file oaipmharvest-0.0.7-py3-none-any.whl.

File metadata

  • Download URL: oaipmharvest-0.0.7-py3-none-any.whl
  • Upload date:
  • Size: 18.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for oaipmharvest-0.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 a1f84cdd34fc4fcceb7fa81d26129df55fa91f9f291b3473287c0ca76224b1a0
MD5 c7515e738b0110dca4ef2950252b8408
BLAKE2b-256 771d45762cfd158ca5307d25346a9ea25a267d95aa627fb3b8ca1e555a955a08

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page