Tool for bulk harvests of digitised newspaper articles from Trove
Project description
trove-newspaper-harvester
No installation required!
If you want to use the harvester without installing anything, just head over to the Trove Newspaper Harvester section in my GLAM Workbench.
Installation
pip install trove-newspaper-harvester
Use as a library
from trove-newspaper-harvester.core import prepare_query, Harvester
Generate a set of query parameters using
prepare_query
.
my_query = "https://trove.nla.gov.au/search/category/newspapers?keyword=wragge"
my_api_key = "mYSecREtkEy"
my_query_params = prepare_query(query=my_query, key=my_api_key)
Initialise the
Harvester
with your query parameters.
harvester = Harvester(query_params=my_query_params)
Start the harvest!
harvester.harvest()
If the harvest fails just run
Harvester.harvest
again.
See the core module documentation for more options and examples.
Use as a command-line tool
Before you do any harvesting you need to get yourself a Trove API key.
There are three basic commands:
- start – start a new harvest
- restart – restart a stalled harvest
- report – view harvest details
Start a harvest
To start a new harvest you can just do:
troveharvester start "[Trove query]" [Trove API key]
The Trove query can either be a url copied and pasted from a search in the Trove web interface, or a Trove API query url constructed using something like the Trove API Console. Enclose the url in double quotes.
See the CLI module documentation for more details.
Created by Tim Sherratt for the GLAM Workbench. Support this project by becoming a GitHub sponsor.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for trove-newspaper-harvester-0.6.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8d4e11929524d322edd8fba4f82c237c163dd8e15780505e856e2bac16e44d76 |
|
MD5 | b07415c59f94090f6ebaeb21724c5f5a |
|
BLAKE2b-256 | b3bbfcbe9ad13f55c498b958303c8f27b6f8b2b0116bed06c27a5b9ef63a1927 |
Hashes for trove_newspaper_harvester-0.6.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2d4f54242c15db42d6920e1511248da1c3f29dd67aff5afe4586f719aa2d612e |
|
MD5 | b24f3800555777c4ea65699bf5d246a6 |
|
BLAKE2b-256 | 995ce876317fa8124fb1c3ce0385b34b720c837ff00f1384e31cf1d29e9708f7 |