Downloads multiple pages from Hathitrust from the CLI.
Project description
HathiTrust Downloader
Installing
Python 3 (OS Agnostic)
Check that you have Python 3 installed and available on your shell. The following command should return something like Python 3.8.5
.
python -V
Windows users: Make sure to enable the "Add to PATH" option when installing Python.
Now you can install using pip
with the following command.
pip install hathitrust-downloader
Which allows you to interact with the downloader from the command line:
hathitrust-downloader --help
Windows Executable
There are also Windows executables available. Download the downloader.zip
from releases and extract it. Then open a shell, e.g. cmd
or powershell
, to the directory where the ZIP is extracted. The easiest way to do this is by opening the folder in Explorer and right click while holding shift, now select the option "Open PowerShell Window Here".
The executable is bundled with Python and all other dependencies, hence you do not need to have anything installed. Now the tool can be used like:
.\downloader.exe --help
Usage
The help should give some instructions on how to use the tool:
usage: hathitrust-downloader [-h] [--name NAME] id start_page end_page
Book downloader from Hathitrust
positional arguments:
id The ID of the book, e.g 'mdp.39015027794331'.
start_page The page number of the first page to be downloaded.
end_page The last number of the last page to be downloaded (inclusive).
optional arguments:
-h, --help show this help message and exit
--name NAME The start of the filename. Defaults to using the id. This can
One example which downloads page 1 until (and including) 10 of the book with id mdp.39015027794331
and naming the files my-book_page_<page_number>.pdf
:
hathitrust-downloader mdp.39015027794331 1 10 --name my-book
The ID of the file can be found as part of the URL when opening a book through your browser.
Troubleshooting
No Progess / Progress Bar is Stuck
Make sure that you can access books on HathiTrust. Try to open a book in your browser to see if everything is working fine. HathiTrust can require you to connect from an American IP. In addition, they limit the amount of pages you can download to 15 every 5 minutes. When you hit that limit you will need to wait, the tool will automatically wait and resume when the timeout is finished.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for hathitrust-downloader-1.1.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6a98310d40ce17eb35f52db94c094f5e11587984274e8126609ce3649b6074c9 |
|
MD5 | d6b84dfc09bebb7be0a2378811934d16 |
|
BLAKE2b-256 | 57adeac9fa9945236c1a36eec630ddff466f00e0c243532239c64271e93724b5 |
Hashes for hathitrust_downloader-1.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3f604858eb8873e8a8e8fad8a802d12c186affcf6ea00a227346739a9aaced9c |
|
MD5 | f0c2e199dfc5aac5544daa5ac911c322 |
|
BLAKE2b-256 | adfaacf42b522fc829f7a3fc6001fb3af5bd9d39d93f85f559d699209edc3a99 |