a library for scraping things
scrapelib is a library for making requests to less-than-reliable websites, it is implemented (as of 0.7) as a wrapper around requests.
scrapelib originated as part of the Open States project to scrape the websites of all 50 state legislatures and as a result was therefore designed with features desirable when dealing with sites that have intermittent errors or require rate-limiting.
Advantages of using scrapelib over alternatives like httplib2 simply using requests as-is:
- All of the power of the suberb requests library.
- HTTP, HTTPS, and FTP requests via an identical API
- support for simple caching with pluggable cache backends
- request throttling
- configurable retries for non-permanent site failures
Written by James Turk <firstname.lastname@example.org>, thanks to Michael Stephens for initial urllib2/httplib2 version
See https://github.com/jamesturk/scrapelib/graphs/contributors for contributors.
- python 2.7, >=3.3
- requests >= 2.0 (earlier versions may work but aren’t tested)
import scrapelib s = scrapelib.Scraper(requests_per_minute=10) # Grab Google front page s.get('http://google.com') # Will be throttled to 10 HTTP requests per minute while True: s.get('http://example.com')
Release history Release notifications
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
|Filename, size & hash SHA256 hash help||File type||Python version||Upload date|
|scrapelib-1.1.1-py2.py3-none-any.whl (16.1 kB) Copy SHA256 hash SHA256||Wheel||py2.py3||Apr 16, 2018|
|scrapelib-1.1.1.tar.gz (14.1 kB) Copy SHA256 hash SHA256||Source||None||Apr 16, 2018|