Skip to main content
Join the official 2020 Python Developers SurveyStart the survey!

JavaScript support and proxy rotation for Scrapy with ScrapingBee

Project description

Scrapy ScrapingBee Middleware

build version python

Integrate Scrapy with ScrapingBee API to use headless browsers for JavaScript and proxy rotation. Requires to create an account on scrapingbee.com to get an API key.

Installation

pip install scrapy-scrapingbee

Configuration

Add your SCRAPINGBEE_API_KEY and the ScrapingBeeMiddleware to your project settings.py. Don't forget to set CONCURRENT_REQUESTS according to your ScrapingBee plan.

SCRAPINGBEE_API_KEY = 'REPLACE-WITH-YOUR-API-KEY'

DOWNLOADER_MIDDLEWARES = {
    'scrapy_scrapingbee.ScrapingBeeMiddleware': 725,
}

CONCURRENT_REQUESTS = 1

Usage

Inherit your spiders from ScrapingBeeSpider and yield a ScrapingBeeRequest.

ScrapingBeeSpider overrides the default logger to hide your API key in the Scrapy logs.

Below you can see an example from the spider in httpbin.py.

from scrapy_scrapingbee import ScrapingBeeSpider, ScrapingBeeRequest

JS_SNIPPET = 'window.scrollTo(0, document.body.scrollHeight);'


class HttpbinSpider(ScrapingBeeSpider):
    name = 'httpbin'
    start_urls = [
        'https://httpbin.org',
    ]

    def start_requests(self):
        for url in self.start_urls:
            yield ScrapingBeeRequest(url, params={
                # 'render_js': False,
                # 'block_ads': True,
                # 'block_resources': False,
                # 'js_snippet': JS_SNIPPET,
                # 'premium_proxy': True,
                # 'country_code': 'fr',
                # 'return_page_source': True,
                # 'wait': 3000,
                # 'wait_for': '#swagger-ui',
            },
            headers={
                # 'Accept-Language': 'En-US',
            },
            cookies={
                # 'name_1': 'value_1',
            })

    def parse(self, response):
        ...

You can pass ScrapingBee parameters in the params argument of a ScrapingBeeRequest. Headers and cookies are passed like a normal Scrapy Request. ScrapingBeeRequest formats all parameters, headers and cookies to the format expected by the ScrapingBee API.

Examples

Add your API key to settings.py.

To run the examples you need to clone this repository. In your terminal, go to examples/httpbin/httpbin and run the example spider with:

scrapy crawl httpbin

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for scrapy-scrapingbee, version 0.0.2
Filename, size File type Python version Upload date Hashes
Filename, size scrapy_scrapingbee-0.0.2-py3-none-any.whl (5.3 kB) File type Wheel Python version py3 Upload date Hashes View
Filename, size scrapy-scrapingbee-0.0.2.tar.gz (4.1 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page