Crawlera middleware for Scrapy
Project description
scrapy-crawlera provides easy use of Crawlera with Scrapy.
Installation
You can install scrapy-crawlera using pip:
pip install scrapy-crawlera
You can then enable the middleware in your settings.py:
DOWNLOADER_MIDDLEWARES = { ... 'scrapy_crawlera.CrawleraMiddleware': 610 }
Credentials
There are two ways to specify credentials.
Through settings.py:
CRAWLERA_ENABLED = True CRAWLERA_APIKEY = 'apikey'
Through spider attributes:
class MySpider: crawlera_enabled = True crawlera_apikey = 'apikey'
How to use it
You just need to specify the headers when making a request like:
scrapy.Request( 'http://example.com', headers={ 'X-Crawlera-Max-Retries': 1, ... }, )
Remember that you could also set which headers to use by default by all requests with DEFAULT_REQUEST_HEADERS
Changes
v1.2.1 (2016-10-17)
Fix release date in README.
v1.2.0 (2016-10-17)
Recommend middleware order to be 610 to run before RedirectMiddleware.
Change default download timeout to 190s or 3 minutes 10 seconds (instead of 1800s or 30 minutes).
Test and advertize Python 3 compatiblity.
New crawlera/request and crawlera/request/method/* stats counts.
Clear Scrapy DNS cache for proxy URL in case of connection errors.
Distribute plugin as universal wheel.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for scrapy_crawlera-1.2.1-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 39f48cfd13f5c2463f916efea13920ba842a77e07efba40d3975e8c13a7c379b |
|
MD5 | 0c50b23f1f411857ec6d25f5c03d4f6e |
|
BLAKE2b-256 | 396dc6f630122fe502ec16d6b48e670d11061ea0e37b05f37e0045c0cd98d839 |