Skip to main content

A Python script to submit web pages to the Wayback Machine for archiving.

Project description

Wayback Machine Archiver (Archiver for short) is a commandline utility writen in Python to backup Github Pages using the Internet Archive.

Installation

The best way to install Archiver is with pip:

pip install wayback-machine-archiver

This will give you access to the script simply by calling:

archiver --help

You can also clone this repository:

git clone https://github.com/agude/wayback-machine-archiver.git
cd wayback-machine-archiver
python ./wayback_machine_archiver/archiver.py --help

If you clone the repository, Archiver can be installed as a local application using the setup.py script:

git clone https://github.com/agude/wayback-machine-archiver.git
cd wayback-machine-archiver
./setup.py install

Which, like using pip, will give you access to the script by calling archiver.

Usage

You can schedule a backup by specifying the URL a web page, like so:

archiver https://alexgude.com

This will submit the main page of my blog, alexgude.com, to the Wayback Machine for archiving.

You can also archive all the URLs specified in a `sitemap.xml <https://en.wikipedia.org/wiki/Sitemaps>`__ as follows:

archiver --sitemaps https://alexgude.com/sitemap.xml

This will backup every page listed in the sitemap of my website, alexgude.com.

You can backup multiple pages by specifying multiple URLs or sitemaps:

archiver https://radiokeysmusic.com --sitemaps https://charles.uno/sitemap.xml https://alexgude.com/sitemaps.xml

You can also backup multiple URLs by writing them to a file, one URL per line, and passing that file to archiver:

archiver https://radiokeysmusic.com --sitemaps https://charles.uno/sitemap.xml https://alexgude.com/sitemaps.xml

Sitemaps often exclude themselves, so you can request that the sitemap itself be backed up using the flag --archive-sitemap-also:

archiver --sitemaps https://alexgude.com/sitemaps.xml --archive-sitemap-also

Archiver requires the ``requests` library <https://github.com/kennethreitz/requests>`__ by Kenneth Reitz. Archiver supports Python 2.7, and Python 3.4+.

Setting Up a Sitemap.xml for Github Pages

It is easy to automatically generate a sitemap for a Github Pages Jekyll site. Simply use jekyll/jekyll-sitemap.

Setup instructions can be found on the above site; they require changing just a single line of your site’s _config.yml.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wayback-machine-archiver-1.5.1.tar.gz (4.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wayback_machine_archiver-1.5.1-py3-none-any.whl (6.0 kB view details)

Uploaded Python 3

File details

Details for the file wayback-machine-archiver-1.5.1.tar.gz.

File metadata

  • Download URL: wayback-machine-archiver-1.5.1.tar.gz
  • Upload date:
  • Size: 4.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.1

File hashes

Hashes for wayback-machine-archiver-1.5.1.tar.gz
Algorithm Hash digest
SHA256 52c41627ea5f02d42442eb5976af5478aa3711a696ec15a547a79b2827e73c12
MD5 fcf3ab9f6e3a146f12619e5f198d3a7d
BLAKE2b-256 5b6f436e7f34ee28adc499d9c263a4857f596695b2c441fdb9377cad9e682c02

See more details on using hashes here.

File details

Details for the file wayback_machine_archiver-1.5.1-py3-none-any.whl.

File metadata

  • Download URL: wayback_machine_archiver-1.5.1-py3-none-any.whl
  • Upload date:
  • Size: 6.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.1

File hashes

Hashes for wayback_machine_archiver-1.5.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1a22dd9e83508f4f927290c991789c26ceb8a311f383a40aca3d93491b7bb819
MD5 ccefdcaa626b02f23be0d4d289830396
BLAKE2b-256 f390a47d838a3a7f00eedf8d2b5735d0a842eaef89dfa79b79f5003ee79651aa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page