Python library and CLI for archiving URLs on popular services like Wayback Machine
Project description
pgark
Python library and CLI for archiving URLs on popular services like Wayback Machine
Basically a fork of the great pastpages/savepagenow
How to use
Install with:
$ pip install pgark
For a given URL, to get the latest available snapshot for a URL:
$ pgark check whitehouse.gov
http://web.archive.org/web/20200904180914/https://www.whitehouse.gov/
To get the JSON response from the Wayback Machine API, pass in the
-j/--json
flag:
$ pgark check -j whitehouse.gov
{
"archived_snapshots": {
"closest": {
"timestamp": "20200904180914",
"status": "200",
"available": true,
"url": "http://web.archive.org/web/20200904180914/https://www.whitehouse.gov/"
}
},
"url": "whitehouse.gov"
}
To save a URL on the Wayback Machine:
$ pgark save whitehouse.gov
http://web.archive.org/web/20200904230109/https://www.whitehouse.gov/
To get the JSON response with pgark-snapshot metadata and the Wayback
Machine API job status response, pass in -j/--json
flag:
$ pgark -j save whitehouse.gov
{
"snapshot_status": "success",
"snapshot_url": "http://web.archive.org/web/20200904230109/https://www.whitehouse.gov/",
...
"last_job_status": {
"status": "success",
"duration_sec": 10.638,
"job_id": "443e89c2-fd3e-4d01-bd35-abfccc3a124a"
...
}
}
See an example of the Wayback Machine's full JSON response in: examples/web.archive.org/job-save-success.json
Project status
Just spitballing. Will probably just return to forking savepagenow and adding any changes/fixes.
See CHANGELOG for more details
Similar libraries, resources, and inspirations
-
Wayback Machine official docs and stuff"
-
Other libraries and utilities:
-
Other stuff:
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for pgark-0.0.1-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d41bf8ea3be8e70a9b32cbbda3b927e1b8b732fe8a587d8107d9f474658be7a5 |
|
MD5 | 3b7890f3351675c43f5b5f974ad2d398 |
|
BLAKE2b-256 | 1d46c6aac533392811d3039cbfa3207ee90384672ee833a8c08cade0ef5b9fb8 |