Skip to main content

Highly configurable oai-harvester based on sickle.

Project description

oaipmharvest

Description

oaipmharvest is a harvester for OAI-PMH written in python and based on sickle (for now). It's special focus lies on support for advanced non-standard use cases. If you just need a the standard feature set, you might be better off with something more mature and better tested.

The oaipmharvester will connect to a given OAI-endpoint and store its responses in a given output folder. It enables you to make incremental requests from the given OAI-endpoint or even restrict the result set by a given date. In addition to that, it provides several features to dynamically construct set specifiers from smaller parts.

This is an alpha release. Use with caution.

Features

  • Configuration via TOML
  • Advanced configuration support for dynamic sets (for e.g. those supported by BASE)

Installation

After cloning the git repository locally, set up a virtual environment and run

pip install .

Running

For running the application, you can call after the installation the CLI command oaipm_harvest, which also provides a help function by calling oaipm_harvester -h.

usage: oaipm_harvest [-h] [--from FROM] [--until UNTIL] file

positional arguments:
  file                  Config file (TOML)

optional arguments:
  -h, --help            show this help message and exit
  --from FROM, -f FROM  Harvest only items that where published after the specified date
  --until UNTIL, -u UNTIL
                        Harvest only items that where published before the specified date

To harvest a specific OAI, you have to provide a conf-file. An example conf-file for the most basic use case could be conf/my-journal.conf and would contain, for example:

endpoint_url = "https://www.contributions-to-entomology.org/oai/"
metadata_prefixes = ["marcxml"]
out_dir = "./out_cte"
use_sets = false

where

endpoint_url is the OAI-base-URL you want to connect to.

metadata_prefixes is a list of formats you want to download. The format is simply handed to the OAI-interface and, hence, it depends on the OAI-interface, if it supports the given format or not.

out_dir is the directory, where all the downloaded data will be stored. If the given folder(s) do not exists, they will be created.

use_sets false

Licence

All parts of this code are copyrighted by the University Library JCS, Frankfurt a. M. The project is made available under the Mozilla Public License 2.0.

Acknowledgement

This is a project created and maintained by the Specialised Information Service for Linguistics at the University Library J. C. Senckenberg and funded by the German Research Foundation (DFG; project identifier 326024153).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

oaipmharvest-0.0.2.tar.gz (14.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

oaipmharvest-0.0.2-py3-none-any.whl (15.4 kB view details)

Uploaded Python 3

File details

Details for the file oaipmharvest-0.0.2.tar.gz.

File metadata

  • Download URL: oaipmharvest-0.0.2.tar.gz
  • Upload date:
  • Size: 14.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.10

File hashes

Hashes for oaipmharvest-0.0.2.tar.gz
Algorithm Hash digest
SHA256 d4297816330737608179ed14f7430bb19b0c18bcf0d1169fd97dd9ca3eeede9a
MD5 374ef5b5c634d99b9809c188305544d7
BLAKE2b-256 bc0bce9924c8dae1d249056e3e29bae7e22ba276c2c34088acdfe971f49d9a22

See more details on using hashes here.

File details

Details for the file oaipmharvest-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: oaipmharvest-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 15.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.10

File hashes

Hashes for oaipmharvest-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 2546a1ab4d604920d01737a08e763b28123122707d49a4d961695382591b14ee
MD5 883597f3cfdad703d5bda97cb932d04c
BLAKE2b-256 1ce3776abe919a78d5dff2cffd2fa1dcc67751f7003ac2b4c28d87630044574c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page