A command line utility for scraping wayback snapshots from archive.org.
Project description
wayback_scraper
Scraping urls from archive.org (Wayback Machine Snapshots)
#Command-Line Interface The usage information can be printed by running wayback-scraper -h.
usage: wayback-scraper [-h] -u URL [-l LIMIT] [-f Start TIMESTAMP]
[-t End TIMESTAMP] [-p DIRECTORY] [-n FILENAME] -o
Output type
optional arguments:
-h, --help show this help message and exit
-u URL, --url URL url for the wayback scraper to scrape. [eg:
http://www.example.com
-l LIMIT, --limit LIMIT
limit the number of results to scrape
-f Start TIMESTAMP, --from Start TIMESTAMP
start date, format - YYYYmmddHHMMSS. Example : -f 2011
-t End TIMESTAMP, --to End TIMESTAMP
end date, format - YYYYmmddHHMMSS. Example : -t 201205
-p DIRECTORY, --path DIRECTORY
Output file path
-n FILENAME, --name FILENAME
filename for the output data
-o Output type, --output Output type
output type
Examples
The usage can be perhaps be made more clear with a couple of concrete examples.
wayback-scraper -u https://www.archive.org -l 2 -f 2011 -t 2012 -n output-o json
which produces:
{
'archived_urls':
[
'url':'http://www.archive.org:80/',
'date':'1997-01-26',
'timestamp':'19970126045828'
],
[
'url':'http://www.archive.org:80/',
'date':'1997-10-01',
'timestamp':'19971011050034'
]
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
wayback_scraper-1.1.tar.gz
(2.9 kB
view details)
Built Distribution
File details
Details for the file wayback_scraper-1.1.tar.gz
.
File metadata
- Download URL: wayback_scraper-1.1.tar.gz
- Upload date:
- Size: 2.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.19.1 setuptools/40.2.0 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.6.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6e50a621f0e053cc721532b3a9df28c928f36495437aee91f3ae4a55d54c17fb |
|
MD5 | e0239ad81672d3bb76b9c6826784af65 |
|
BLAKE2b-256 | ebe97e04b8cd3d6f74433cf83dac711e0d354ae5d1d3faa65e7f4715f1437589 |
File details
Details for the file wayback_scraper-1.1-py3-none-any.whl
.
File metadata
- Download URL: wayback_scraper-1.1-py3-none-any.whl
- Upload date:
- Size: 4.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.19.1 setuptools/40.2.0 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.6.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e40fb68d1ce696a133df490566bb57e45bf042655e301cf0a71f1a9b843b83b8 |
|
MD5 | 81f65d8a66a96b12cf082c395d4deda3 |
|
BLAKE2b-256 | 707654601c204562173b35a53e529c0e6c7fa0f3f6769d70e2a4da471f620f12 |