Wget-compatible web downloader and crawler.
Project description
Wpull is a Wget-compatible (or remake/clone/replacement/alternative) web downloader and crawler.
Features:
Written in Python: lightweight & robust
Familiar Wget options and behavior
Graceful stopping and resuming
Python & Lua scripting support
Modular, extensible, & asynchronous API
PhantomJS integration
Currently in beta quality! Some features are not implemented yet and the API is not considered stable.
Install
Requires:
Lunatic Python (bastibe version) (optional for Lua support)
PhantomJS (optional)
Once you install the requirements, install Wpull from PyPI using pip:
pip3 install wpull
For detailed installation instructions, please see http://wpull.readthedocs.org/en/master/install.html.
Run
To download the About page of Google.com:
wpull google.com/about
To archive a website:
wpull billy.blogsite.example --warc-file blogsite-billy \ --no-check-certificate \ --no-robots --user-agent "InconspiuousWebBrowser/1.0" \ --wait 0.5 --random-wait --waitretry 600 \ --page-requisites --recursive --level inf \ --span-hosts --domains blogsitecdn.example,cloudspeeder.example \ --hostnames billy.blogsite.example \ --reject-regex "/login\.php" \ --tries inf --retry-connrefused --retry-dns-error \ --delete-after --database blogsite-billy.db \ --quiet --output-file blogsite-billy.log
To see all options:
wpull --help
Documentation
Documentation is located at http://wpull.readthedocs.org/.
Help
Need help? Please see our Help page which contains frequently asked questions and support information.
The issue tracker is located at https://github.com/chfoo/wpull/issues.
Dev
Contributions and feedback are greatly appreciated.
Credits
Copyright 2013-2014 by Christopher Foo. License GPL v3.
This project contains third-party source code licensed under different terms:
backport
wpull.backport.argparse
wpull.backport.collections
wpull.backport.functools
wpull.backport.tempfile
wpull.backport.urlparse
wpull.thirdparty.robotexclusionrulesparser
We would like to acknowledge the authors of GNU Wget as Wpull uses algorithms from Wget.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.