A command line utility for scraping wayback snapshots from archive.org.
Project description
wayback_scraper
Scraping urls from archive.org (Wayback Machine Snapshots)
Command-Line Interface
The usage information can be printed by running wayback-scraper -h.
usage: wayback-scraper [-h] -u URL [-l LIMIT] [-f Start TIMESTAMP]
[-t End TIMESTAMP] [-p DIRECTORY] [-n FILENAME] -o
Output type
optional arguments:
-h, --help show this help message and exit
-u URL, --url URL url for the wayback scraper to scrape. [eg:
http://www.example.com
-l LIMIT, --limit LIMIT
limit the number of results to scrape
-f Start TIMESTAMP, --from Start TIMESTAMP
start date, format - YYYYmmddHHMMSS. Example : -f 2011
-t End TIMESTAMP, --to End TIMESTAMP
end date, format - YYYYmmddHHMMSS. Example : -t 201205
-p DIRECTORY, --path DIRECTORY
Output file path
-n FILENAME, --name FILENAME
filename for the output data
-o Output type, --output Output type
output type
Examples
The usage can be perhaps be made more clear with a couple of concrete examples.
wayback-scraper -u https://www.archive.org -l 2 -f 2011 -t 2012 -n output-o json
which produces:
{
'archived_urls':
[
'url':'http://www.archive.org:80/',
'date':'1997-01-26',
'timestamp':'19970126045828'
],
[
'url':'http://www.archive.org:80/',
'date':'1997-10-01',
'timestamp':'19971011050034'
]
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file wayback_scraper-1.2.tar.gz.
File metadata
- Download URL: wayback_scraper-1.2.tar.gz
- Upload date:
- Size: 2.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.19.1 setuptools/40.2.0 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.6.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bcb697d9153c6357951be28840978d460c76aec80ebd521438476384b4f191e6
|
|
| MD5 |
e79a6294d826b06d5fa0882594b181d6
|
|
| BLAKE2b-256 |
3e35b12f7886b064cc09ce522c239fe34ec08923f423e622658a28b5613cd4f2
|
File details
Details for the file wayback_scraper-1.2-py3-none-any.whl.
File metadata
- Download URL: wayback_scraper-1.2-py3-none-any.whl
- Upload date:
- Size: 4.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.19.1 setuptools/40.2.0 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.6.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
09810552d864f52a9d13a0220575e25633b861d90bb22cc65b1a8325e1249ec9
|
|
| MD5 |
0c5a91e2775eb53b98af268446015b35
|
|
| BLAKE2b-256 |
d6ec51ee1d29032711d58bdafe4b5f2fa38273f99db0a6be0e5063f989f4aa48
|