Skip to main content

Python tool for archiving web pages through Internet Archive Wayback Machine

Project description

PRs Welcome Conventional Commits Code style: black Github Actions PyPI Package latest release PyPI Package download count (per month) Supported versions

Wayback Machine Saver

Python tool for archiving web pages through Internet Archive Wayback Machine

Getting Started

Prerequisites

Installation

It's recommended to use tools like pipx to install this command-line tool.

pipx install wayback-machine-saver

Usage

Save pages

Save URLs from the input file to Internet Archive - Wayback Machine

wayback_machine_saver save-pages FILENAME

Argument

  • FILENAME: filename to the file that consists of URLs to save

e.g.,

https://example.com
https://another-example.com

options

  • --deliminator TEXT [default: "\n"]
  • --error-log-filename TEXT [default: save-pages-error-log-"timestamp".csv]

Get latest archive urls

After the URLs have been saved, Internet Archive - Wayback Machine will snap-shot the page to their database and create a timestamp. You can access the latest one through http://web.archive.org/web/[Your URL] and it will be redirected to http://web.archive.org/web/[timestamp]/[Your URL]. This command is used to get the redirected URLs.

wayback_machine_saver get-latest-archive-urls FILENAME

Argument

  • FILENAME: filename to the file that consists of URLs to retrieved

e.g.,

https://example.com
https://another-example.com

options

  • --deliminator TEXT [default: "\n"]
  • --output-filename TEXT [default: retrieved-urls-"timestamp".csv]]
  • --error-log-filename TEXT [default: get-url-error-log-"timestamp".csv]

Configuration

Wayback Machine Saves supports configurating through environment variable. You can run export VARIABLE=VALUE before running the script to change the behavior.

  • WAYBACK_MACHINE_SAVER_RETRY_TIMES
    • times to retry (default: 3)
  • HTTPX_TIMEOUT
    • timeout for all GET operations (default: 10)

Contributing

See Contributing

Authors

Wei Lee weilee.rx@gmail.com

Created from Lee-W/cookiecutter-python-template version 0.9.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wayback_machine_saver-0.3.1.tar.gz (6.0 kB view details)

Uploaded Source

Built Distribution

wayback_machine_saver-0.3.1-py3-none-any.whl (6.3 kB view details)

Uploaded Python 3

File details

Details for the file wayback_machine_saver-0.3.1.tar.gz.

File metadata

  • Download URL: wayback_machine_saver-0.3.1.tar.gz
  • Upload date:
  • Size: 6.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.9.5

File hashes

Hashes for wayback_machine_saver-0.3.1.tar.gz
Algorithm Hash digest
SHA256 b685b4bd7bc10bcda713ba0b9a0445fad548de17a4b9f5dea10530d4a58f56cf
MD5 32069344ff032559fb038fbaf4e1d6ee
BLAKE2b-256 2c153c826cb56ef3049b5885e47c9658d94259a3087e0cd50d9692e26aec052a

See more details on using hashes here.

File details

Details for the file wayback_machine_saver-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: wayback_machine_saver-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 6.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.9.5

File hashes

Hashes for wayback_machine_saver-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 08ae16fa04f39e990b1ed1d01737e660ff512f9995a237e552158010e33c0ef7
MD5 418a777d8b93a08c5824fb032d89152a
BLAKE2b-256 3843cc904d0b7303f96add2580960aeed8fc917c323f996edcecd0d63920beba

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page