Crawlera middleware for Scrapy
Project description
scrapy-crawlera provides easy use of Crawlera with Scrapy.
Installation
You can install scrapy-crawlera using pip:
pip install scrapy-crawlera
You can then enable the middleware in your settings.py:
DOWNLOADER_MIDDLEWARES = { ... 'scrapy_crawlera.CrawleraMiddleware': 610 }
Credentials
There are two ways to specify credentials.
Through settings.py:
CRAWLERA_ENABLED = True CRAWLERA_APIKEY = 'apikey'
Through spider attributes:
class MySpider: crawlera_enabled = True crawlera_apikey = 'apikey'
How to use it
You just need to specify the headers when making a request like:
scrapy.Request( 'http://example.com', headers={ 'X-Crawlera-Max-Retries': 1, ... }, )
Remember that you could also set which headers to use by default by all requests with DEFAULT_REQUEST_HEADERS
Changes
v1.2.0 (YYYY-MM-DD)
Recommend middleware order to be 610 to run before RedirectMiddleware.
Change default download timeout to 190s or 3 minutes 10 seconds (instead of 1800s or 30 minutes).
Test and advertize Python 3 compatiblity.
New crawlera/request and crawlera/request/method/* stats counts.
Clear Scrapy DNS cache for proxy URL in case of connection errors.
Distribute plugin as universal wheel.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for scrapy_crawlera-1.2.0-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 37dcddd8019a728e96a53986f3f088bf1df97e06cb5f3cac50a26a0cd91921a5 |
|
MD5 | 286456ce673c30d4c69612fba3762a98 |
|
BLAKE2b-256 | 25cacbf4b9abd9b601e0e1d8d56142ce5c0e6fdff503559924cc8a78e8acb387 |