Throw all URIs in a page on to Wayback Machine from CLI.
Project description
wbsv
wbsv
(stands for "WayBack machine SavepageNow") is…
CLI tool for saving webpage on Wayback Machine forever. Enables you to save all URIs in a webpage forever on Wayback Machine.
Try now
You can try this tool on Google Cloud Shell. (First, sudo python3 -m pip install -e .
)
DEMO
Install
$ python -m pip install wbsv # Python3.0+
Run & Examples
Help
$ wbsv -v
wbsv 0.1.9
$ wbsv -h
usage: wbsv [-h] [-v] [-r cnt] [-t] [-L lv] [url [url ...]]
CLI tool for save webpage on Wayback Machine forever.
Save webpage and one's all URI(s) on Wayback Machine.
positional arguments:
url Saving pages in order.
optional arguments:
-h, --help show this help message and exit
-v, --version Show version and exit
-r cnt, --retry cnt Set a retry limit on failed save.
-t, --only_target Save just target webpage(s).
-L lv, --level lv Set maximum recursion depth.
additional information:
If you don't give the URL,
interactive mode will be launched.
(To quit interactive mode,
type "end", "exit", "exit()",
"break", "bye", ":q" or "finish".)
Interactive mode
$ wbsv
[[Input a target url (ex: https://google.com)]]
>>> https://www.u.tsukuba.ac.jp
[+]Now: https://www.u.tsukuba.ac.jp
[+]60 URI(s) found.
[01/60]: <NOW> https://web.archive.org/web/20200412020015/https://www.u.tsukuba.ac.jp/password/
[02/60]: <FAIL> https://www.u.tsukuba.ac.jp/info_lit/tebiki.html
[03/60]: <NOW> https://web.archive.org/web/20200412020026/https://www.u.tsukuba.ac.jp/account/
...
[58/60]: <NOW> https://web.archive.org/web/20200412022608/https://www.u.tsukuba.ac.jp/phishing/
[59/60]: <FAIL> https://www.u.tsukuba.ac.jp/wordpress/wp-content/uploads/note_usingcomputerrooms.png
[60/60]: <NOW> https://web.archive.org/web/20200412022640/https://www.u.tsukuba.ac.jp/
[+]FIN!: https://www.u.tsukuba.ac.jp
[+]ALL: 60 SAVE: 57 FAIL: 3
[+]To exit, use CTRL+C or type 'end'
[[Input a target url (ex: https://google.com)]]
>>> exit
[+]End.
$
From stdin
$ wbsv https://tsumanne.net https://tsumanne.net/ct
[+]Now: https://tsumanne.net
[+]4 URI(s) found.
[1/4]: <NOW> https://web.archive.org/web/20200412022931/https://tsumanne.net/si/
[2/4]: <NOW> https://web.archive.org/web/20200412022935/https://tsumanne.net/
[3/4]: <NOW> https://web.archive.org/web/20200412022938/https://tsumanne.net/my/
[4/4]: <NOW> https://web.archive.org/web/20200412022949/https://tsumanne.net/ct/
[+]FIN!: https://tsumanne.net
[+]ALL: 4 SAVE: 4 FAIL: 0
[+]Now: https://tsumanne.net/ct
[+]3 URI(s) found.
[1/3]: <NOW> https://web.archive.org/web/20200412022958/https://tsumanne.net/
[2/3]: <NOW> https://web.archive.org/web/20200412023000/https://tsumanne.net/oa_login.php
[3/3]: <NOW> https://web.archive.org/web/20200412023012/https://tsumanne.net/ct/?cat=&of=25
[+]FIN!: https://tsumanne.net/ct
[+]ALL: 3 SAVE: 3 FAIL: 0
$
Search links recurcively
$ wbsv https://programming-place.net/ppp/contents/c/index.html -L2
Increase limit of retry
$ wbsv https://tsumanne.net --retry 10
VERSION
wbsv 0.1.9
LISENCE
MIT
Author
eggplants (haruna)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
wbsv-0.1.9.tar.gz
(5.9 kB
view hashes)
Built Distribution
wbsv-0.1.9-py3-none-any.whl
(8.1 kB
view hashes)