A Python script to submit web pages to the Wayback Machine for archiving.
Project description
Wayback Machine Archiver (Archiver for short) is a commandline utility writen in Python to backup Github Pages using the Internet Archive.
Installation
The best way to install Archiver is with pip:
pip install wayback-machine-archiver
This will give you access to the script simply by calling:
archiver --help
You can also clone this repository:
git clone https://github.com/agude/wayback-machine-archiver.git
cd wayback-machine-archiver
python ./wayback_machine_archiver/archiver.py --help
If you clone the repository, Archiver can be installed as a local application using the setup.py script:
git clone https://github.com/agude/wayback-machine-archiver.git
cd wayback-machine-archiver
./setup.py install
Which, like using pip, will give you access to the script by calling archiver.
Usage
You can schedule a backup by specifying the URL a web page, like so:
archiver https://alexgude.com
This will submit the main page of my blog, alexgude.com, to the Wayback Machine for archiving.
You can also archive all the URLs specified in a `sitemap.xml <https://en.wikipedia.org/wiki/Sitemaps>`__ as follows:
archiver --sitemaps https://alexgude.com/sitemap.xml
This will backup every page listed in the sitemap of my website, alexgude.com.
You can backup multiple pages by specifying multiple URLs or sitemaps:
archiver https://radiokeysmusic.com --sitemaps https://charles.uno/sitemap.xml https://alexgude.com/sitemaps.xml
Sitemaps often exclude themselves, so you can request that the sitemap itself be backed up using the flag --archive-sitemap-also:
archiver --sitemaps https://alexgude.com/sitemaps.xml --archive-sitemap-also
Archiver requires the ``requests` library <https://github.com/kennethreitz/requests>`__ by Kenneth Reitz. Archiver supports Python 2.7, and Python 3.4+.
Setting Up a Sitemap.xml for Github Pages
It is easy to automatically generate a sitemap for a Github Pages Jekyll site. Simply use jekyll/jekyll-sitemap.
Setup instructions can be found on the above site; they require changing just a single line of your site’s _config.yml.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for wayback-machine-archiver-1.3.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8eceb8551b71557507d8857a6890f500854e7a9d1055209be81708dfb1708627 |
|
MD5 | b1d0f7f17d7f8dbe8969c38a55e3da1f |
|
BLAKE2b-256 | 88f99dd60d9c7c49d74abdd6afb00ea21f19655577c43cb117396dffa252e0b8 |
Hashes for wayback_machine_archiver-1.3.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 73d00359ec6a8aa0a4c77e3f44be199739b8da9811f9637cfe9eaac4b2df1e64 |
|
MD5 | 8052c6aecef34e0732b1b67e1f57397c |
|
BLAKE2b-256 | ac02f15936abab1d46b8e9196318607cf2cff7576c486ec89d304fe022e94f2a |