Skip to main content

Downloads multiple pages from Hathitrust from the CLI.

Project description

HathiTrust Downloader

PyPI PyPI - Downloads GitHub version GitHub download GitHub stars License

Installing

Python 3 (OS Agnostic)

Check that you have Python 3 installed and available on your shell. The following command should return something like Python 3.12.5.

python -V

Windows users: Make sure to enable the "Add to PATH" option when installing Python.

Now you can install using pip with the following command.

pip install hathitrust-downloader

Which allows you to interact with the downloader from the command line:

hathitrust-downloader --help

Windows Executable

There are also Windows executables available. Download the downloader.zip from releases and extract it. Then open a shell, e.g. cmd or powershell, to the directory where the ZIP is extracted. The easiest way to do this is by opening the folder in Explorer and right click while holding shift, now select the option "Open PowerShell Window Here".

The executable is bundled with Python and all other dependencies, hence you do not need to have anything installed. Now the tool can be used like:

.\downloader.exe --help 

Usage

The help should give some instructions on how to use the tool:

usage: hathitrust-downloader [-h] [--name NAME] id start_page end_page

Book downloader for HathiTrust

positional arguments:
  id           The ID of the book, e.g 'mdp.39015027794331'.
  start_page   The page number of the first page to be downloaded.
  end_page     The last number of the last page to be downloaded (inclusive).

options:
  -h, --help   show this help message and exit
  --name NAME  The start of the filename. Defaults to using the id. This can also be used to change the path.

For example, the following command will download page 1 until (and including) 10 of the book with id mdp.39015073487137 and naming the files output files my-book_page_<page_number>.pdf:

hathitrust-downloader mdp.39015073487137 1 10 --name my-book

[!IMPORTANT] The ID of the file can be found as part of the URL when opening a book through your browser. Below is an example URL and where to find the ID:

https://babel.hathitrust.org/cgi/pt?id=mdp.39015073487137&seq=13
                                       ^^^^^^^^^^^^^^^^^^ This demarks the ID of this book

Troubleshooting

No Progess / Progress Bar is Stuck

Make sure that you can access books on HathiTrust. Try to open a book in your browser to see if everything is working fine. HathiTrust can require you to connect from an American IP. In addition, they limit the amount of pages you can download to 15 every 5 minutes. When you hit that limit you will need to wait, the tool will automatically wait and resume when the timeout is finished.

Developing

Clone the repository:

git clone https://github.com/Addono/HathiTrust-downloader.git
cd ./HathiTrust-downloader

(Optional) Create a virtual Python environment, recommended but not required.

Install the package with dependencies:

pip install .

Now you can run the tool with:

python -m hathitrustdownloader.cli --help

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hathitrust-downloader-1.2.1.tar.gz (4.9 kB view details)

Uploaded Source

Built Distribution

hathitrust_downloader-1.2.1-py3-none-any.whl (5.6 kB view details)

Uploaded Python 3

File details

Details for the file hathitrust-downloader-1.2.1.tar.gz.

File metadata

  • Download URL: hathitrust-downloader-1.2.1.tar.gz
  • Upload date:
  • Size: 4.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.14

File hashes

Hashes for hathitrust-downloader-1.2.1.tar.gz
Algorithm Hash digest
SHA256 fa8919aa23336a473fed94b220fd4ccf217f141a5f0c36c40007ee5a54feff93
MD5 9d1becd1394ab0807e14ac168ceb052e
BLAKE2b-256 a7334549b7e7a6f28342e0660de5faa321fe212bb6b9a0710b625fe7f38d463b

See more details on using hashes here.

File details

Details for the file hathitrust_downloader-1.2.1-py3-none-any.whl.

File metadata

File hashes

Hashes for hathitrust_downloader-1.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 14bf19393ad5cb06284b94beccd675700a3e812d3ead4814786507525b21476c
MD5 98b308a3caa81635578dce0071902cbf
BLAKE2b-256 aad1ac85117aee3eee17bf285dca3d545fe6ffaf4deac303469030b76572200e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page