A Python script to submit web pages to the Wayback Machine for archiving.
Project description
Wayback Machine Archiver (Archiver for short) is a commandline utility writen in Python to backup Github Pages using the Internet Archive.
Installation
The best way to install Archiver is with pip:
pip install wayback-machine-archiver
This will give you access to the script simply by calling:
archiver --help
You can also clone this repository:
git clone https://github.com/agude/wayback-machine-archiver.git
cd wayback-machine-archiver
python ./wayback_machine_archiver/archiver.py --help
If you clone the repository, Archiver can be installed as a local application using the setup.py script:
git clone https://github.com/agude/wayback-machine-archiver.git
cd wayback-machine-archiver
./setup.py install
Which, like using pip, will give you access to the script by calling archiver.
Usage
You can schedule a backup by specifying the URL a web page, like so:
archiver https://alexgude.com
This will submit the main page of my blog, alexgude.com, to the Wayback Machine for archiving.
You can also archive all the URLs specified in a `sitemap.xml <https://en.wikipedia.org/wiki/Sitemaps>`__ as follows:
archiver --sitemaps https://alexgude.com/sitemap.xml
This will backup every page listed in the sitemap of my website, alexgude.com.
You can backup multiple pages by specifying multiple URLs or sitemaps:
archiver https://radiokeysmusic.com --sitemaps https://charles.uno/sitemap.xml https://alexgude.com/sitemaps.xml
Sitemaps often exclude themselves, so you can request that the sitemap itself be backed up using the flag --archive-sitemap-also:
archiver --sitemaps https://alexgude.com/sitemaps.xml --archive-sitemap-also
Archiver requires the ``requests` library <https://github.com/kennethreitz/requests>`__ by Kenneth Reitz. Archiver supports Python 2.7, and Python 3.4+.
Setting Up a Sitemap.xml for Github Pages
It is easy to automatically generate a sitemap for a Github Pages Jekyll site. Simply use jekyll/jekyll-sitemap.
Setup instructions can be found on the above site; they require changing just a single line of your site’s _config.yml.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file wayback-machine-archiver-1.3.2.tar.gz.
File metadata
- Download URL: wayback-machine-archiver-1.3.2.tar.gz
- Upload date:
- Size: 4.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.35.0 CPython/3.7.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
63c110d16bcca89a4a8a63c7b9c26edb2237a3fa28e791b4d69a05a58d2cbb34
|
|
| MD5 |
3793f19ba182c64b49f024334f06dd91
|
|
| BLAKE2b-256 |
bd49682aa72f233537002728273df032b108d9b91ea4268db046bab4674851eb
|
File details
Details for the file wayback_machine_archiver-1.3.2-py3-none-any.whl.
File metadata
- Download URL: wayback_machine_archiver-1.3.2-py3-none-any.whl
- Upload date:
- Size: 5.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.35.0 CPython/3.7.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0a9f373d7461705163bc79f1dbbbab9e85d826f233168c4799e915888bf41bd0
|
|
| MD5 |
10a68174055e525fa6d816341b73fba3
|
|
| BLAKE2b-256 |
7e377526851781e04adf3ce89c97bcb5966f3e2e7532802ab939adefb87bdcb6
|