A tool for working with archival description for public access.
Project description
description_harvester
A tool for working with archival description for public access. description_harvester reads archival description into a minimalist data model for public-facing archival description and then converts it to the Arclight data model and POSTs it into an Arclight Solr index using PySolr.
description_harvester is designed to be extensible and harvest archival description from a number of sources. Currently the only available source harvests data from the ArchivesSpace API using ArchivesSnake. It is possible in the future to add modules for EAD2002 and other sources. Its also possible to add additional output modules to serialize description to EAD or other formats in addition to or in replace of sending description to an Arclight Solr instance. This potential opens up new possibilities of managing description using low-barrier formats and tools.
The main branch is designed to be a drop-in replacement for the Arclight Traject indexer, while the dao-indexing branch tries to fully index digital objects from digital repositories and other sources, including item-level metadata fields, embedded text, OCR text, and transcriptions.
This is still a bit drafty, as its only tested on ASpace v2.8.0 and needs better error handling. Validation is also very minimal, but there is potential to add detailed validation with jsonschema
.
Installation
pip install description_harvester
First, you need to configure ArchivesSnake by creating a ~/.archivessnake.yml
file with your API credentials as detailed by the ArchivesSnake configuration docs.
Next, you also need a ~/.description_harvester.yml
file that lists your Solr URL and the core you want to index to. These can also be overridden with args.
solr_url: http://127.0.0.1:8983/solr
solr_core: blacklight-core
last_query: 0
Indexing from ArchivesSpace API to Arclight
Once description_harvester is set up, you can index from the ASpace API to Arclight using the to-arclight
command.
Index by id_0
You can provide one or more IDs to index using a resource's id_0` field
harvest --id ua807
harvest --id mss123 apap106
Index by URI
You can also use integers from ASpace URIs for resource, such as 263 for https://my.aspace.edu/resources/263
harvest --uri 435
harvest --uri 1 755
Indexing by modified time
Index collections modified in the past hour: harvest --hour
Index collections modified in the past day: harvest --today
Index collections modified since las run: harvest --new
Deleting collections
You can delete one or more collections using the --delete
argument in addition to--id
. This uses the Solr document ID, such as apap106
for https://my.arclight.edu/catalog/apap106
.
harvest --id apap101 apap301 --delete
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file description_harvester-0.0.5.tar.gz
.
File metadata
- Download URL: description_harvester-0.0.5.tar.gz
- Upload date:
- Size: 15.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ff31e2244112c3f219781cf8cc37e4162c886548c652aa166e77bf7ab761446e |
|
MD5 | 921cedc5437f613698df85998a5e38db |
|
BLAKE2b-256 | 10261079f2d3e7a7233102b67496fce270a5fba90d135695a57dd2fc762ee56f |
File details
Details for the file description_harvester-0.0.5-py3-none-any.whl
.
File metadata
- Download URL: description_harvester-0.0.5-py3-none-any.whl
- Upload date:
- Size: 17.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fafc388b2f5c6f2d32e6dae102c31422c8ebcde817cd30decee4d5a4a15cc24c |
|
MD5 | 09cae9f31ec3d2ea4463d6830df134cd |
|
BLAKE2b-256 | e9f1ceabc73447fe47cf1721586266788f06db729cd869f378ec00fb6e52e838 |