Skip to main content

Tool to download Trove newspaper articles as images.

Project description

trove-newspaper-images

Background and alternatives

There’s no reliable way of downloading an image of a Trove newspaper article from the web interface. The image download option produces an HTML page with embedded images, and the article is often sliced into pieces to fit the page.

This package includes tools to download articles as complete JPEG images. If an article is printed across multiple newspaper pages, multiple images will be downloaded – one for each page. It’s intended for integration into other tools and processing workflows, or for people who like working on the command line.

If you just want to quickly download an article as an image without installing anything, you can use this web app in the GLAM Workbench. To download images of all articles returned by a search in Trove, you can also use the Trove Newspaper and Gazette Harvester.

See the documentation for more information.

Install

pip install trove-newspaper-images

Download articles as images

Use as a library

from trove_newspaper_images.articles import download_images

images = download_images('107024751')
images
['nla.news-article107024751-11565831.jpg']

Use from the command line

Just call trove_newspaper_images.download from the command line with an article identifier. You can use the --output_dir parameter to specify a directory for the downloaded images. For example:

trove_newspaper_images.download 107024751 --output_dir images

Add the --masked parameter to try and remove content from neighbouring articles.

trove_newspaper_images.download 107024751 --masked

Created by Tim Sherratt (@wragge) for the GLAM Workbench.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

trove_newspaper_images-0.3.1.tar.gz (8.1 kB view details)

Uploaded Source

Built Distribution

trove_newspaper_images-0.3.1-py3-none-any.whl (7.2 kB view details)

Uploaded Python 3

File details

Details for the file trove_newspaper_images-0.3.1.tar.gz.

File metadata

  • Download URL: trove_newspaper_images-0.3.1.tar.gz
  • Upload date:
  • Size: 8.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.10.12

File hashes

Hashes for trove_newspaper_images-0.3.1.tar.gz
Algorithm Hash digest
SHA256 be34e2819edad672d1a904fa63169b08b9333b5f41482a6e34bb1a7b8ee68554
MD5 022e6f0b77142e3a88b96900604c9c3a
BLAKE2b-256 1d7e032036c1d3b1098ded5cdbcf4f52ad79af75f8143a92b071d2007abd3f39

See more details on using hashes here.

File details

Details for the file trove_newspaper_images-0.3.1-py3-none-any.whl.

File metadata

File hashes

Hashes for trove_newspaper_images-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2eda61a7f7dd9a18f31156464d693d978b401ba9e6fe81f7f5b6006405da9e1a
MD5 e45bf00593b8b097065e9445b091ba7f
BLAKE2b-256 d401aa8459d71fd87bdfe3b624d777a7593934bb19a2f930f97984044eb60076

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page