a library for scraping things
Project description
scrapelib is a library for making requests to less-than-reliable websites, it is implemented (as of 0.7) as a wrapper around requests.
scrapelib originated as part of the Open States project to scrape the websites of all 50 state legislatures and as a result was therefore designed with features desirable when dealing with sites that have intermittent errors or require rate-limiting.
Advantages of using scrapelib over alternatives like httplib2 simply using requests as-is:
- All of the power of the suberb requests library.
- HTTP, HTTPS, and FTP requests via an identical API
- support for simple caching with pluggable cache backends
- request throttling
- configurable retries for non-permanent site failures
Written by James Turk <dev@jamesturk.net>, thanks to Michael Stephens for initial urllib2/httplib2 version
See https://github.com/jamesturk/scrapelib/graphs/contributors for contributors.
Requirements
- python 2.7, >=3.3
- requests >= 2.0 (earlier versions may work but aren’t tested)
Example Usage
Documentation: http://scrapelib.readthedocs.org/en/latest/
import scrapelib
s = scrapelib.Scraper(requests_per_minute=10)
# Grab Google front page
s.get('http://google.com')
# Will be throttled to 10 HTTP requests per minute
while True:
s.get('http://example.com')
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
| Filename, size | File type | Python version | Upload date | Hashes |
|---|---|---|---|---|
| Filename, size scrapelib-1.2.0-py2.py3-none-any.whl (16.0 kB) | File type Wheel | Python version py2.py3 | Upload date | Hashes View |
| Filename, size scrapelib-1.2.0.tar.gz (14.0 kB) | File type Source | Python version None | Upload date | Hashes View |
Hashes for scrapelib-1.2.0-py2.py3-none-any.whl
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 | 1984cb3e1c72eab9130c2cb7c4b17dcff08992bd4c829d59e6d48fcf7c4f7b16 |
|
| MD5 | fcce2776457f78e7b7049ab9cdef579c |
|
| BLAKE2-256 | a085ca29e44748abe598daffe1a6dad8a175f3acff57fe09daab040ab0bf604a |