Skip to main content

A Python script to submit web pages to the Wayback Machine for archiving.

Project description

Wayback Machine Archiver (Archiver for short) is a commandline utility writen in Python to backup Github Pages using the Internet Archive.


The best way to install Archiver is with pip:

pip install wayback-machine-archiver

This will give you access to the script simply by calling:

archiver --help

You can also clone this repository:

git clone
cd wayback-machine-archiver
python ./wayback_machine_archiver/ --help

If you clone the repository, Archiver can be installed as a local application using the script:

git clone
cd wayback-machine-archiver
./ install

Which, like using pip, will give you access to the script by calling archiver.

Archiver requires the ``requests` library <>`__ by Kenneth Reitz. Archiver supports Python 2.7, and Python 3.4+.


The simplest way to schedule a backup is by specifying the URL of a web page, like so:


This will submit the main page of my blog,, to the Wayback Machine for archiving.

You can also archive all the URLs specified in a `sitemap.xml <>`__ as follows:

archiver --sitemaps

This will backup every page listed in the sitemap of my website,

You can also pass a sitemap.xml file (requires the file:// prefix) to the archiver:

archiver --sitemaps file://sitemap.xml

You can backup multiple pages by specifying multiple URLs or sitemaps:

archiver --sitemaps

You can also backup multiple URLs by writing them to a file (for example, urls.txt), one URL per line, and passing that file to archiver:

archiver --file urls.txt

Sitemaps often exclude themselves, so you can request that the sitemap itself be backed up using the flag --archive-sitemap-also:

archiver --sitemaps --archive-sitemap-also


For a full list of commandline flags, Archiver has a built-in help displayed with archiver --help:

usage: archiver [-h] [--version] [--file FILE]
                [--sitemaps SITEMAPS [SITEMAPS ...]]
                [--log {DEBUG,INFO,WARNING,ERROR,CRITICAL}]
                [--log-to-file LOG_FILE] [--archive-sitemap-also]
                [--jobs JOBS] [--rate-limit-wait RATE_LIMIT_IN_SEC]
                [urls [urls ...]]

A script to backup a web pages with Internet Archive

positional arguments:
  urls                  the URLs of the pages to archive

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  --file FILE           path to a file containing urls to save (one url per
  --sitemaps SITEMAPS [SITEMAPS ...]
                        one or more URIs to sitemaps listing pages to archive;
                        local paths must be prefixed with 'file://'
                        set the logging level, defaults to WARNING
  --log-to-file LOG_FILE
                        redirect logs to a file
                        also submit the URL of the sitemap to be archived
  --jobs JOBS, -j JOBS  run this many concurrent URL submissions, defaults to
  --rate-limit-wait RATE_LIMIT_IN_SEC
                        number of seconds to wait between page requests to
                        avoid flooding the archive site, defaults to 5; also
                        used as the backoff factor for retries

Setting Up a Sitemap.xml for Github Pages

It is easy to automatically generate a sitemap for a Github Pages Jekyll site. Simply use jekyll/jekyll-sitemap.

Setup instructions can be found on the above site; they require changing just a single line of your site’s _config.yml.

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wayback-machine-archiver-1.9.1.tar.gz (6.9 kB view hashes)

Uploaded Source

Built Distribution

wayback_machine_archiver-1.9.1-py3-none-any.whl (7.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page