Python library and CLI for archiving URLs on popular services like Wayback Machine
Project description
pgark
Python library and CLI for archiving URLs on popular services like Wayback Machine
Basically a fork of the great pastpages/savepagenow
How to use
Install with:
$ pip install pgark
For a given URL, to get the latest available snapshot for a URL:
$ pgark check whitehouse.gov
http://web.archive.org/web/20200904180914/https://www.whitehouse.gov/
To get the JSON response from the Wayback Machine API, pass in the
-j/--json
flag:
$ pgark check -j whitehouse.gov
{
"archived_snapshots": {
"closest": {
"timestamp": "20200904180914",
"status": "200",
"available": true,
"url": "http://web.archive.org/web/20200904180914/https://www.whitehouse.gov/"
}
},
"url": "whitehouse.gov"
}
To save a URL on the Wayback Machine:
$ pgark save whitehouse.gov
http://web.archive.org/web/20200904230109/https://www.whitehouse.gov/
To get the JSON response with pgark-snapshot metadata and the Wayback
Machine API job status response, pass in -j/--json
flag:
$ pgark -j save whitehouse.gov
{
"snapshot_status": "success",
"snapshot_url": "http://web.archive.org/web/20200904230109/https://www.whitehouse.gov/",
"...": "...",
"last_job_status": {
"status": "success",
"duration_sec": 10.638,
"job_id": "443e89c2-fd3e-4d01-bd35-abfccc3a124a",
"...": "..."
},
"...": "...",
"job_url": "http://web.archive.org/status/443e89c2-fd3e-4d01-bd35-abfccc3a124a"
}
See an example of the Wayback Machine's full JSON response in: examples/web.archive.org/job-save-success.json
Project status
Just spitballing. Will probably just return to forking savepagenow and adding any changes/fixes.
See CHANGELOG for more details
Similar libraries, resources, and inspirations
-
Wayback Machine official docs and stuff"
-
Other libraries and utilities:
-
Other stuff:
Development notes
To resync Pipfile.lock and setup.py
$ pipenv lock --pre
$ pipenv-setup sync --dev
To run tests:
$ pytest
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for pgark-0.0.2-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0d15b1dc79b1fb827f25e5c9c078448bf9e26aca85fca0a94891c13778e9254e |
|
MD5 | 729c0d2bc279426e06a8c44ac9de3dac |
|
BLAKE2b-256 | 295a527d2f36af821abecebd5f131f049b1abbc3ddc91d7b692e6ddbb57708c6 |