Python library and CLI for archiving URLs on popular services like Wayback Machine
Project description
pgark
Python library and CLI for archiving URLs on popular services like Wayback Machine
Basically a fork of the great pastpages/savepagenow
How to use
Install with:
$ pip install pgark
The available subcommands are:
check Check if there is a snapshot of [URL] on the [-s/--service].
save Attempt to save a snapshot of [URL] using the [-s/--service].
(for now, only the Wayback Machine service is implemented, so ignore -s
flag)
Saving a snapshot of a URL
$ pgark save whitehouse.gov
http://web.archive.org/web/20200904230109/https://www.whitehouse.gov/
To get the JSON response with pgark-snapshot metadata and the Wayback
Machine API job status response, pass in -j/--json
flag:
$ pgark -j save whitehouse.gov
{
"snapshot_url": "http://web.archive.org/web/20200904230109/https://www.whitehouse.gov/",
"...": "...",
"server_payload": {
"status": "success",
"duration_sec": 10.638,
"job_id": "443e89c2-fd3e-4d01-bd35-abfccc3a124a",
"...": "..."
}
}
See an example of the Wayback Machine's full JSON response in: examples/web.archive.org/job-save-success.json
Checking if a URL is already snapshotted
For a given URL, to get the latest available snapshot for a URL:
$ pgark check whitehouse.gov
http://web.archive.org/web/20200904180914/https://www.whitehouse.gov/
To get the JSON response from the Wayback Machine API, pass in the
-j/--json
flag:
$ pgark check -j whitehouse.gov
{
"snapshot_url": "http://web.archive.org/web/20200904180914/https://www.whitehouse.gov/",
"server_payload": {
"archived_snapshots": {
"closest": {
"timestamp": "20200904180914",
"status": "200",
"available": true,
"url": "http://web.archive.org/web/20200904180914/https://www.whitehouse.gov/"
}
},
"url": "whitehouse.gov"
}
}
Project status
Just spitballing. Will probably just return to forking savepagenow and adding any changes/fixes.
See CHANGELOG for more details
Similar libraries, resources, and inspirations
-
Wayback Machine official docs and stuff"
-
Other libraries and utilities:
-
Other stuff:
Development notes
To get setup:
$ make init
To run tests:
$ make test
To freeze Pipfile.lock and resync with setup.py
$ make freeze
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for pgark-0.0.3-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d8f3848d03e09fe6ff126ab0953f3e22dbe7b4aa4930a79bc5af2d0669e856bc |
|
MD5 | 9b7ca4dce303749eefebafecc590dfd6 |
|
BLAKE2b-256 | e5dd3de40c55891f7828f69001a90656434935bc2f38d621e68ff89fed689213 |