JavaScript support and proxy rotation for Scrapy with ScrapingBee
Project description
Scrapy ScrapingBee Middleware
Integrate Scrapy with ScrapingBee API to use headless browsers for JavaScript and proxy rotation. Requires to create an account on scrapingbee.com to get an API key.
Installation
pip install scrapy-scrapingbee
Configuration
Add your SCRAPINGBEE_API_KEY
and the ScrapingBeeMiddleware
to your project settings.py. Don't forget to set CONCURRENT_REQUESTS
according to your ScrapingBee plan.
SCRAPINGBEE_API_KEY = 'REPLACE-WITH-YOUR-API-KEY'
DOWNLOADER_MIDDLEWARES = {
'scrapy_scrapingbee.ScrapingBeeMiddleware': 725,
}
CONCURRENT_REQUESTS = 1
Usage
Inherit your spiders from ScrapingBeeSpider
and yield a ScrapingBeeRequest
.
ScrapingBeeSpider overrides the default logger to hide your API key in the Scrapy logs.
Below you can see an example from the spider in httpbin.py.
from scrapy_scrapingbee import ScrapingBeeSpider, ScrapingBeeRequest
JS_SNIPPET = 'window.scrollTo(0, document.body.scrollHeight);'
class HttpbinSpider(ScrapingBeeSpider):
name = 'httpbin'
start_urls = [
'https://httpbin.org',
]
def start_requests(self):
for url in self.start_urls:
yield ScrapingBeeRequest(url, params={
# 'render_js': False,
# 'block_ads': True,
# 'block_resources': False,
# 'js_snippet': JS_SNIPPET,
# 'premium_proxy': True,
# 'country_code': 'fr',
# 'return_page_source': True,
# 'wait': 3000,
# 'wait_for': '#swagger-ui',
},
headers={
# 'Accept-Language': 'En-US',
},
cookies={
# 'name_1': 'value_1',
})
def parse(self, response):
...
You can pass ScrapingBee parameters in the params argument of a ScrapingBeeRequest. Headers and cookies are passed like a normal Scrapy Request. ScrapingBeeRequest formats all parameters, headers and cookies to the format expected by the ScrapingBee API.
Examples
Add your API key to settings.py.
To run the examples you need to clone this repository. In your terminal, go to examples/httpbin/httpbin
and run the example spider with:
scrapy crawl httpbin
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file scrapy-scrapingbee-0.0.5.tar.gz
.
File metadata
- Download URL: scrapy-scrapingbee-0.0.5.tar.gz
- Upload date:
- Size: 4.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 726785fc6027443af0cd94dc621d971a58b8c266cae563deaaf87e68f94a2ac0 |
|
MD5 | e40d58f6e8b2d3984ed162d089923f4c |
|
BLAKE2b-256 | d4dc79cafa6989a9de04e0126558308cdf1545b47d93c44e7aa14e5c4d81ed58 |
File details
Details for the file scrapy_scrapingbee-0.0.5-py3-none-any.whl
.
File metadata
- Download URL: scrapy_scrapingbee-0.0.5-py3-none-any.whl
- Upload date:
- Size: 5.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fdc4fccfe45ac405bb89ace61876fc699ef4889ed47b450674d814f8ff49448f |
|
MD5 | 961243fee231078d6e3d8e2b59fdb12f |
|
BLAKE2b-256 | ca22707b86d51987f754fd494c4ada2b73fd296da4b8d3ef87c933f435985387 |