Python library and CLI for archiving URLs on popular services like Wayback Machine
Project description
pgark
Python library and CLI for archiving URLs on popular services like Wayback Machine
Basically a fork of the great pastpages/savepagenow
How to use
Install with:
$ pip install pgark
The available subcommands are:
check Check if there is a snapshot of [URL] on the [-s/--service].
save Attempt to save a snapshot of [URL] using the [-s/--service].
(for now, only the Wayback Machine service is implemented, so ignore -s
flag)
Saving a snapshot of a URL
$ pgark save whitehouse.gov
http://web.archive.org/web/20200904230109/https://www.whitehouse.gov/
To get the JSON response with pgark-snapshot metadata and the Wayback
Machine API job status response, pass in -j/--json
flag:
$ pgark -j save whitehouse.gov
{
"snapshot_url": "http://web.archive.org/web/20200904230109/https://www.whitehouse.gov/",
"...": "...",
"server_payload": {
"status": "success",
"duration_sec": 10.638,
"job_id": "443e89c2-fd3e-4d01-bd35-abfccc3a124a",
"...": "..."
}
}
See an example of the Wayback Machine's full JSON response in: examples/web.archive.org/job-save-success.json
Checking if a URL is already snapshotted
For a given URL, to get the latest available snapshot for a URL:
$ pgark check whitehouse.gov
http://web.archive.org/web/20200904180914/https://www.whitehouse.gov/
To get the JSON response from the Wayback Machine API, pass in the
-j/--json
flag:
$ pgark check -j whitehouse.gov
{
"snapshot_url": "http://web.archive.org/web/20200904180914/https://www.whitehouse.gov/",
"server_payload": {
"archived_snapshots": {
"closest": {
"timestamp": "20200904180914",
"status": "200",
"available": true,
"url": "http://web.archive.org/web/20200904180914/https://www.whitehouse.gov/"
}
},
"url": "whitehouse.gov"
}
}
Project status
Just spitballing. Will probably just return to forking savepagenow and adding any changes/fixes.
See CHANGELOG for more details
Similar libraries, resources, and inspirations
-
Wayback Machine official docs and stuff"
-
Other libraries and utilities:
-
Other stuff:
Development notes
To get setup:
$ make init
To run tests:
$ make test
To freeze Pipfile.lock and resync with setup.py
$ make freeze
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pgark-0.0.3.tar.gz
.
File metadata
- Download URL: pgark-0.0.3.tar.gz
- Upload date:
- Size: 256.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/46.4.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ba2123398bb27cfc585477f40b7b4e576ec4181c1d839753a0091247f7f50f96 |
|
MD5 | 57a36f55f1adfbadc9ddd59686d5b2a9 |
|
BLAKE2b-256 | 1803fa4bf174698b42e7047217806d16fc2498a7cb07688f054ade85de8ae18a |
File details
Details for the file pgark-0.0.3-py2.py3-none-any.whl
.
File metadata
- Download URL: pgark-0.0.3-py2.py3-none-any.whl
- Upload date:
- Size: 15.5 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/46.4.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d8f3848d03e09fe6ff126ab0953f3e22dbe7b4aa4930a79bc5af2d0669e856bc |
|
MD5 | 9b7ca4dce303749eefebafecc590dfd6 |
|
BLAKE2b-256 | e5dd3de40c55891f7828f69001a90656434935bc2f38d621e68ff89fed689213 |