A simple Python wrapper for archive.org's "Save Page Now" capturing service
Project description
A simple Python wrapper for archive.org’s “Save Page Now” capturing service
Installation
$ pip install savepagenow
Python Usage
Import it.
>>> import savepagenow
Capture a URL.
>>> archive_url = savepagenow.capture("http://www.example.com/")
See where it’s stored.
>>> print archive_url
https://web.archive.org/web/20161018203554/http://www.example.com/
If a URL has been recently cached, archive.org may return the URL to that page rather than conduct a new capture. When that happens, the capture method will raise a CachedPage exception.
This is likely happen if you request the same URL twice within a few seconds.
>>> savepagenow.capture("http://www.example.com/")
'https://web.archive.org/web/20161019062637/http://www.example.com/'
>>> savepagenow.capture("http://www.example.com/")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "savepagenow/__init__.py", line 36, in capture
archive_url
savepagenow.exceptions.CachedPage: archive.org returned a cached version of this page: https://web.archive.org/web/20161019062637/http://www.example.com/
You can craft your code to catch that exception yourself, or use the built-in capture_or_cache method, which will return the URL provided by archive.org along with a boolean indicating if it is a fresh capture (True) or from the cache (False).
>>> savepagenow.capture_or_cache("http://www.example.com/")
('https://web.archive.org/web/20161019062832/http://www.example.com/', True)
>>> savepagenow.capture_or_cache("http://www.example.com/")
('https://web.archive.org/web/20161019062832/http://www.example.com/', False)
There’s no accounting for taste but you could craft a line to handle that command like so:
>>> url, captured = savepagenow.capture_or_cache("http://www.example.com/")
Command-line usage
The Python library is also installed as a command-line interface. You can run it from your terminal like so:
$ savepagenow http://www.example.com/
The command has the same options as the Python API, which you can learn about from its help output.
$ savepagenow --help
Usage: savepagenow [OPTIONS] URL
Archives the provided URL using the archive.org Wayback Machine.
Raises a CachedPage exception if archive.org declines to conduct a new
capture and returns a previous snapshot instead.
Options:
-ua, --user-agent TEXT User-Agent header for the web request
-c, --accept-cache Accept and return cached URL
--help Show this message and exit.
Customizing the user agent
In an effort to be transparent and polite to the Internet Archive, all requests made by savepagenow carry a custom user agent that identifies itself as "savepagenow (https://github.com/pastpages/savepagenow)".
You can further customize this setting by using the optional arguments to our API.
Here’s how to do it in Python:
>>> savepagenow.capture("http://www.example.com/", user_agent="my user agent here")
And here’s how to do it from the command line:
$ savepagenow http://www.example.com/ --user-agent "my user agent here"
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file savepagenow-0.0.13.tar.gz.
File metadata
- Download URL: savepagenow-0.0.13.tar.gz
- Upload date:
- Size: 4.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/39.2.0 requests-toolbelt/0.8.0 tqdm/4.29.1 CPython/3.6.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
553f21518b1aff933aeb00bad5cfc35a2060d7c9e12a6cfe22ae3dc9591cfd40
|
|
| MD5 |
674e7e49e38909d154d574b1c9f4cdd1
|
|
| BLAKE2b-256 |
419855d43569372663b82c793e76e41e40c132cc566227931d492ebedab4e9a5
|
File details
Details for the file savepagenow-0.0.13-py2.py3-none-any.whl.
File metadata
- Download URL: savepagenow-0.0.13-py2.py3-none-any.whl
- Upload date:
- Size: 5.7 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/39.2.0 requests-toolbelt/0.8.0 tqdm/4.29.1 CPython/3.6.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
445ace4f0c6fea4c74df26e4c91f7f5d04bffbc94e30d8593ff07f9b8b57e26b
|
|
| MD5 |
363def67202f0911535349aa2f1c99d8
|
|
| BLAKE2b-256 |
f6006ea5d1e65f79f9db3a74a47801cd08e7d11088f50e70e3f7810e71e444aa
|