Skip to main content

A Scrapy middleware for accessing ZenRows Scraper API with minimal setup.

Project description

Scrapy_ZenRows Middleware

This is a Scrapy middlware that provides an interface for interacting with the ZenRows® Scraper API in your Scrapy spiders. It lets you enjoy all the features of the ZenRows® Scraper API while using scrapy.

Introduction

The ZenRows® Scraper API is an all-in-one toolkit designed to simplify and enhance the process of extracting data from websites. Whether you’re dealing with static or dynamic content, our API provides a range of features to meet your scraping needs efficiently.

With Premium Proxies, ZenRows gives you access to over 55 million residential IPs from 190+ countries, ensuring 99.9% uptime and highly reliable scraping sessions. Our system also handles advanced fingerprinting, header rotation, and IP management, enabling you to scrape even the most protected sites without needing to manually configure these elements.

ZenRows makes it easy to bypass complex anti-bot measures, handle JavaScript-heavy sites, and interact with web elements dynamically — all with the right features enabled.

Table of Contents

Installation

pip install scrapy-zenrows

Usage

  • Sign up for free on ZenRows to open the Request Builder and copy your ZenRows API key and implement the middleware.

Global Settings

  • Add the ZenRows Scraper API middleware to your DOWNLOADER_MIDDLEWARE and specify your ZenRows API Key:

settings.py

DOWNLOADER_MIDDLEWARES = {
    "scrapy_zenrows.ZenRowsMiddleware": 543,
}

# ZenRows API Key
ZENROWS_API_KEY = "<YOUR_ZENROWS_API_KEY>"

Set Global Premium Proxy and JS Rendering

The middleware will not use premium proxy and JS rendering by default. So, USE_ZENROWS_PREMIUM_PROXY and USE_ZENROWS_JS_RENDER are False by default. To turn on premium proxy and JS rendering globally, set both parameters to True:

settings.py

# ...

USE_ZENROWS_PREMIUM_PROXY = True # to turn on premium proxy (False by default)
USE_ZENROWS_JS_RENDER = True # to turn on JS rendering (False by default)

Override Global Settings for Specific Requests

If you have multiple spiders and don't want to apply global premium proxy and JS rendering for all, you can apply the middleware to specific ones by using ZenRowsRequest in start_requests.

ZenRowsRequest accepts the URL, request params, and headers options.

For example, to set Premium Proxy and JS Rendering for a specific request:

# pip install scrapy-zenrows
from scrapy_zenrows import ZenRowsRequest

class YourSpider(scrapy.Spider):
    # ...

    def start_requests(self):
        # use ZenRowsRequest for customization
        for url in self.start_urls:
            yield ZenRowsRequest(
                url=url,
                # overrides the settings config for this specific spider
                params={
                    "js_render": "true",  # enable JavaScript rendering (if needed)
                    "premium_proxy": "true",  # use the proxy (if needed)
                },
            )

    def parse(self, response):
        # ...

Other Request Parameters

In addition to js_render and premium_proxy, the ZenRowsRequest accepts other parameters accepted by the ZenRows Scraper API:

# ...
class YourSpider(scrapy.Spider):
    # ...

    def start_requests(self):
        # use ZenRowsRequest for customization
        for url in self.start_urls:
            yield ZenRowsRequest(
                url=url,
                params={
                    # ...,
                    "proxy_country": "ca", # use proxy from a specific country
                    "js_instructions": '[{"wait": 500}]', # pass JS instructions
                    "autparse": "true", # for supported websites
                    "outputs": "tables" # extract specific data,
                    'css_extractor': '{"links":"a @href","images":"img @src"}'
                    ""
                },
            )

For more information and supported parameters, check out our Scraper API features.

Using Custom Headers

You must set the custom_headers parameter to true in your request to use customized headers. This tells ZenRows to include your custom headers while managing critical browser-based headers.

For example, the following adds the referer header:

# ...
class YourSpider(scrapy.Spider):
    # ...

    def start_requests(self):
        # use ZenRowsRequest for customization
        for url in self.start_urls:
            yield ZenRowsRequest(
                url=url,
                # overrides the settings config for this specific spider
                params={
                    "custom_headers": "true",  # to use custom headers
                },
                # add a referer header
                headers={
                    "Referer": "https://www.google.com/",
                },
            )

Adding the Cookies Header

Pass Cookies as a meta parameter (separated from the headers) just as specified by Scrapy. However, custome_headers must also be set to true.

# ...
class YourSpider(scrapy.Spider):
    # ...

    def start_requests(self):
        # use ZenRowsRequest for customization
        for url in self.start_urls:
            yield ZenRowsRequest(
                url=url,
                # overrides the settings config for this specific spider
                params={
                    "custom_headers": "true",  # to use custom headers
                },
                cookies={
                    "currency": "USD",
                    "country": "UY",
                },
            )

Check our headers feature for more information on the accepted request headers and how to set them.

Usage Examples

Here are example spider demonstrating how to use the scrapy_zenrows middleware:

Examples directory: examples

👉🏼 Official scrapy-zenrows integration documentation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy_zenrows-1.0.0.tar.gz (6.8 kB view details)

Uploaded Source

Built Distribution

scrapy_zenrows-1.0.0-py3-none-any.whl (7.0 kB view details)

Uploaded Python 3

File details

Details for the file scrapy_zenrows-1.0.0.tar.gz.

File metadata

  • Download URL: scrapy_zenrows-1.0.0.tar.gz
  • Upload date:
  • Size: 6.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.5

File hashes

Hashes for scrapy_zenrows-1.0.0.tar.gz
Algorithm Hash digest
SHA256 4f8ea3011d7eb2dd79dda94b4a9ef63bd7af5471cc2ae7aaf254e5c8ff3a76b8
MD5 570d1b6a6d7ad0fc391f436eeb5fb7c1
BLAKE2b-256 d0897bb1254c4985cbc000e52a92e8864b2e4314b1c6ae09dda27f63c06dc4e3

See more details on using hashes here.

File details

Details for the file scrapy_zenrows-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for scrapy_zenrows-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 78a83b31600e6744a1ee9ce4ed45d8934a692b17ad30d18c5dcb4193e2d1065c
MD5 084998e2bfd1f4d71391f75a876781ec
BLAKE2b-256 2e37ff04403d9af0e330db6a9279a58db8a6f4ec153e56473971b020b9d5e358

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page