Tool for bulk harvests of digitised newspaper articles from Trove
The Trove Newspaper (& Gazette) Harvester makes it easy to download large quantities of digitised articles from Trove’s newspapers and gazettes. Just give it a search from the Trove web interface, and the harvester will save the metadata of all the articles in a CSV (spreadsheet) file for further analysis. You can also save the full text of every article, as well as copies of the articles as JPG images, and even PDFs. While the web interface will only show you the first 2,000 results matching your search, the Newspaper Harvester will get everything.
No installation required!
If you want to use the harvester without installing anything, just head over to the Trove Newspaper Harvester section in my GLAM Workbench.
pip install trove-newspaper-harvester
Before you do any harvesting you need to get yourself a Trove API key.
Use as a library
from trove_newspaper_harvester.core import prepare_query, Harvester
Generate a set of query parameters using
my_query = "https://trove.nla.gov.au/search/category/newspapers?keyword=wragge" my_api_key = "mYSecREtkEy" my_query_params = prepare_query(query=my_query)
with your query parameters and api key.
harvester = Harvester(query_params=my_query_params, key=my_api_key)
Start the harvest!
If the harvest fails just run
See the core module documentation for more options and examples.
Use as a command-line tool
There are three basic commands:
- start – start a new harvest
- restart – restart a stalled harvest
- report – view harvest details
Start a harvest
To start a new harvest you can just do:
troveharvester start "[Trove query]" [Trove API key]
The Trove query can either be a url copied and pasted from a search in the Trove web interface, or a Trove API query url constructed using something like the Trove API Console. Enclose the url in double quotes.
See the CLI module documentation for more details.
Release history Release notifications | RSS feed
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Hashes for trove-newspaper-harvester-0.7.2.tar.gz
Hashes for trove_newspaper_harvester-0.7.2-py3-none-any.whl