No project description provided
Project description
scrapelib is a library for making requests to less-than-reliable websites.
Source: https://github.com/jamesturk/scrapelib
Documentation: https://jamesturk.github.io/scrapelib/
Issues: https://github.com/jamesturk/scrapelib/issues
Features
scrapelib originated as part of the Open States project to scrape the websites of all 50 state legislatures and as a result was therefore designed with features desirable when dealing with sites that have intermittent errors or require rate-limiting.
Advantages of using scrapelib over using requests as-is:
- HTTP(S) and FTP requests via an identical API
- support for simple caching with pluggable cache backends
- highly-configurable request throtting
- configurable retries for non-permanent site failures
- All of the power of the suberb requests library.
Installation
scrapelib is on PyPI, and can be installed via any standard package management tool:
poetry add scrapelib
or:
pip install scrapelib
Example Usage
import scrapelib
s = scrapelib.Scraper(requests_per_minute=10)
# Grab Google front page
s.get('http://google.com')
# Will be throttled to 10 HTTP requests per minute
while True:
s.get('http://example.com')
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for scrapelib-2.2.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ae663b620ddc568736ac09a4162cf61921fe7c7d1d00e44e7bc2d0d98b3551f9 |
|
MD5 | 891bb9634ab01943e2bc926d36562491 |
|
BLAKE2b-256 | 1dd3210ae7068ebb9f7a7c7cc5cecb2bc741d50a6ffb4815998ad475faf8f1ba |