Skip to main content

A tool for working with archival description for public access.

Project description

description_harvester

A tool for working with archival description for public access. description_harvester reads archival description into a minimalist data model for public-facing archival description and then converts it to the Arclight data model and POSTs it into an Arclight Solr index using PySolr.

description_harvester is designed to be extensible and harvest archival description from a number of sources. Currently the only available source harvests data from the ArchivesSpace API using ArchivesSnake. It is possible in the future to add modules for EAD2002 and other sources. Its also possible to add additional output modules to serialize description to EAD or other formats in addition to or in replace of sending description to an Arclight Solr instance. This potential opens up new possibilities of managing description using low-barrier formats and tools.

The main branch is designed to be a drop-in replacement for the Arclight Traject indexer, while the dao-indexing branch tries to fully index digital objects from digital repositories and other sources, including item-level metadata fields, embedded text, OCR text, and transcriptions.

This is still a bit drafty, as its only tested on ASpace v2.8.0 and needs better error handling. Validation is also very minimal, but there is potential to add detailed validation with jsonschema .

Installation

pip install description_harvester

First, you need to configure ArchivesSnake by creating a ~/.archivessnake.ymlfile with your API credentials as detailed by the ArchivesSnake configuration docs.

Next, you also need a ~/.description_harvester.yml file that lists your Solr URL and the core you want to index to. These can also be overridden with args.

solr_url: http://127.0.0.1:8983/solr
solr_core: blacklight-core
last_query: 0

Indexing from ArchivesSpace API to Arclight

Once description_harvester is set up, you can index from the ASpace API to Arclight using the to-arclight command.

Index by id_0

You can provide one or more IDs to index using a resource's id_0` field

harvest --id ua807

harvest --id mss123 apap106

Index by URI

You can also use integers from ASpace URIs for resource, such as 263 for https://my.aspace.edu/resources/263

harvest --uri 435

harvest --uri 1 755

Indexing by modified time

Index collections modified in the past hour: harvest --hour

Index collections modified in the past day: harvest --today

Index collections modified since las run: harvest --new

Deleting collections

You can delete one or more collections using the --delete argument in addition to--id. This uses the Solr document ID, such as apap106 for https://my.arclight.edu/catalog/apap106.

harvest --id apap101 apap301 --delete

Use as a library

You can also use description harvester in a script

from description_harvester import harvest

harvest(["--id", "myid001"])

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

description_harvester-0.2.0.tar.gz (18.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

description_harvester-0.2.0-py3-none-any.whl (19.9 kB view details)

Uploaded Python 3

File details

Details for the file description_harvester-0.2.0.tar.gz.

File metadata

  • Download URL: description_harvester-0.2.0.tar.gz
  • Upload date:
  • Size: 18.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.5

File hashes

Hashes for description_harvester-0.2.0.tar.gz
Algorithm Hash digest
SHA256 b9edb457c05c114363cb60f841c53d1b6658291c798ef62eaba3d4e4c1510d79
MD5 10557fb0fcc30f27b8bbc1e9a985576f
BLAKE2b-256 4386bbbd3b72fd1042f69541e2492ad23827d8a418f3a5a7b04056fd07818dfa

See more details on using hashes here.

File details

Details for the file description_harvester-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for description_harvester-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ef47feb975b02af24c796a53c9f0b113d253e6e2c4c6e8ded5b00f700fc7ab0c
MD5 49f04ee8ae5dae4ad0c9835e319844e8
BLAKE2b-256 9844c4ce886ba6031353ff37837f7f5697f30a3029b0b3cc462528787b4ffd34

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page