Python tool for archiving web pages through Internet Archive Wayback Machine
Project description
Wayback Machine Saver
Python tool for archiving web pages through Internet Archive Wayback Machine
Getting Started
Prerequisites
Installation
It's recommended to use tools like pipx to install this command-line tool.
pipx install wayback-machine-saver
Usage
Save pages
Save URLs from the input file to Internet Archive - Wayback Machine
wayback_machine_saver save-pages FILENAME
Argument
- FILENAME: filename to the file that consists of URLs to save
e.g.,
https://example.com
https://another-example.com
options
- --deliminator TEXT [default: "\n"]
- --error-log-filename TEXT [default: save-pages-error-log-"timestamp".csv]
Get latest archive urls
After the URLs have been saved, Internet Archive - Wayback Machine will snap-shot the page to their database and create a timestamp. You can access the latest one through http://web.archive.org/web/[Your URL]
and it will be redirected to http://web.archive.org/web/[timestamp]/[Your URL]
. This command is used to get the redirected URLs.
wayback_machine_saver get-latest-archive-urls FILENAME
Argument
- FILENAME: filename to the file that consists of URLs to retrieved
e.g.,
https://example.com
https://another-example.com
options
- --deliminator TEXT [default: "\n"]
- --output-filename TEXT [default: retrieved-urls-"timestamp".csv]]
- --error-log-filename TEXT [default: get-url-error-log-"timestamp".csv]
Configuration
Wayback Machine Saves supports configurating through environment variable. You can run export VARIABLE=VALUE
before running the script to change the behavior.
- WAYBACK_MACHINE_SAVER_RETRY_TIMES
- times to retry (default: 3)
- HTTPX_TIMEOUT
- timeout for all GET operations (default: 10)
Contributing
See Contributing
Authors
Wei Lee weilee.rx@gmail.com
Created from Lee-W/cookiecutter-python-template version 0.9.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for wayback_machine_saver-0.3.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | b685b4bd7bc10bcda713ba0b9a0445fad548de17a4b9f5dea10530d4a58f56cf |
|
MD5 | 32069344ff032559fb038fbaf4e1d6ee |
|
BLAKE2b-256 | 2c153c826cb56ef3049b5885e47c9658d94259a3087e0cd50d9692e26aec052a |
Hashes for wayback_machine_saver-0.3.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 08ae16fa04f39e990b1ed1d01737e660ff512f9995a237e552158010e33c0ef7 |
|
MD5 | 418a777d8b93a08c5824fb032d89152a |
|
BLAKE2b-256 | 3843cc904d0b7303f96add2580960aeed8fc917c323f996edcecd0d63920beba |