A Scrapy middleware for accessing ZenRows Scraper API with minimal setup.

These details have not been verified by PyPI

Project links

Project description

Scrapy_ZenRows Middleware

This is a Scrapy middlware that provides an interface for interacting with the ZenRows® Scraper API in your Scrapy spiders. It lets you enjoy all the features of the ZenRows® Scraper API while using scrapy.

Introduction

The ZenRows® Scraper API is an all-in-one toolkit designed to simplify and enhance the process of extracting data from websites. Whether you're dealing with static or dynamic content, our API provides a range of features to meet your scraping needs efficiently.

With Premium Proxies, ZenRows gives you access to over 55 million residential IPs from 190+ countries, ensuring 99.9% uptime and highly reliable scraping sessions. Our system also handles advanced fingerprinting, header rotation, and IP management, enabling you to scrape even the most protected sites without needing to manually configure these elements.

ZenRows makes it easy to bypass complex anti-bot measures, handle JavaScript-heavy sites, and interact with web elements dynamically — all with the right features enabled.

Installation
Usage
- Global Settings
New in v1.1.0
Usage Examples

Installation

pip install scrapy-zenrows-middleware

Usage

Global Settings

Add the ZenRows Scraper API middleware to your DOWNLOADER_MIDDLEWARE and specify your ZenRows API Key:

settings.py

DOWNLOADER_MIDDLEWARES = {
    "scrapy_zenrows.ZenRowsMiddleware": 543,
}

# ZenRows API Key
ZENROWS_API_KEY = "<YOUR_ZENROWS_API_KEY>"

Set Global Premium Proxy and JS Rendering

The middleware will not use premium proxy and JS rendering by default. So, USE_ZENROWS_PREMIUM_PROXY and USE_ZENROWS_JS_RENDER are False by default. To turn on premium proxy and JS rendering globally, set both parameters to True:

settings.py

# ...

USE_ZENROWS_PREMIUM_PROXY = True # to turn on premium proxy (False by default)
USE_ZENROWS_JS_RENDER = True # to turn on JS rendering (False by default)

Override Global Settings for Specific Requests

If you have multiple spiders and don't want to apply global premium proxy and JS rendering for all, you can apply the middleware to specific ones by using ZenRowsRequest in start_requests.

ZenRowsRequest accepts the URL, request params, and headers options.

For example, to set Premium Proxy and JS Rendering for a specific request:

# pip install scrapy-zenrows-middleware
from scrapy_zenrows import ZenRowsRequest

class YourSpider(scrapy.Spider):
    # ...

    def start_requests(self):
        # use ZenRowsRequest for customization
        for url in self.start_urls:
            yield ZenRowsRequest(
                url=url,
                # overrides the settings config for this specific spider
                params={
                    "js_render": "true",  # enable JavaScript rendering (if needed)
                    "premium_proxy": "true",  # use the proxy (if needed)
                },
            )

    def parse(self, response):
        # ...

Other Request Parameters

In addition to js_render and premium_proxy, the ZenRowsRequest accepts other parameters accepted by the ZenRows Scraper API:

# ...
class YourSpider(scrapy.Spider):
    # ...

    def start_requests(self):
        # use ZenRowsRequest for customization
        for url in self.start_urls:
            yield ZenRowsRequest(
                url=url,
                params={
                    # ...,
                    "proxy_country": "ca", # use proxy from a specific country
                    "js_instructions": '[{"wait": 500}]', # pass JS instructions
                    "autparse": "true", # for supported websites
                    "outputs": "tables" # extract specific data,
                    'css_extractor': '{"links":"a @href","images":"img @src"}'
                    ""
                },
            )

For more information and supported parameters, check out our Scraper API features.

Using Custom Headers

You must set the custom_headers parameter to true in your request to use customized headers. This tells ZenRows to include your custom headers while managing critical browser-based headers.

For example, the following adds the referer header:

# ...
class YourSpider(scrapy.Spider):
    # ...

    def start_requests(self):
        # use ZenRowsRequest for customization
        for url in self.start_urls:
            yield ZenRowsRequest(
                url=url,
                # overrides the settings config for this specific spider
                params={
                    "custom_headers": "true",  # to use custom headers
                },
                # add a referer header
                headers={
                    "Referer": "https://www.google.com/",
                },
            )

Adding the Cookies Header

Pass Cookies as a meta parameter (separated from the headers) just as specified by Scrapy. However, custome_headers must also be set to true.

# ...
class YourSpider(scrapy.Spider):
    # ...

    def start_requests(self):
        # use ZenRowsRequest for customization
        for url in self.start_urls:
            yield ZenRowsRequest(
                url=url,
                # overrides the settings config for this specific spider
                params={
                    "custom_headers": "true",  # to use custom headers
                },
                cookies={
                    "currency": "USD",
                    "country": "UY",
                },
            )

Check our headers feature for more information on the accepted request headers and how to set them.

New in v1.1.0

Retry Middleware

New ZenRowsRetryMiddleware with exponential backoff for robust scraping:

settings.py

DOWNLOADER_MIDDLEWARES = {
    "scrapy_zenrows.ZenRowsRetryMiddleware": 550,  # Add retry middleware
    "scrapy_zenrows.ZenRowsMiddleware": 543,
}

# Retry settings (all optional)
ZENROWS_RETRY_ENABLED = True  # Enable/disable retry (default: True)
ZENROWS_MAX_RETRIES = 3  # Max retry attempts (default: 3)
ZENROWS_RETRY_BACKOFF = 1.0  # Backoff factor in seconds (default: 1.0)
ZENROWS_RETRY_STATUS_CODES = [429, 500, 502, 503, 504]  # Status codes to retry

New Parameters

Session Persistence (`session_id`)

Maintain the same IP address across multiple requests (up to 10 minutes):

yield ZenRowsRequest(
    url="https://example.com/login",
    params={
        "session_id": 12345,  # Integer 1-99999
        "premium_proxy": "true",
    },
)

Wait for Element (`wait_for`)

Wait for a CSS selector to appear in DOM before returning (better than fixed wait):

yield ZenRowsRequest(
    url="https://spa-example.com",
    params={
        "js_render": "true",
        "wait_for": ".product-list",  # Wait for this element
    },
)

Block Resources (`block_resources`)

Speed up scraping by blocking unnecessary resources:

yield ZenRowsRequest(
    url="https://example.com",
    params={
        "js_render": "true",
        "block_resources": "image,media,font,stylesheet",
    },
)

Response Type (`response_type`)

Convert HTML to other formats for AI/LLM pipelines:

yield ZenRowsRequest(
    url="https://blog.example.com/article",
    params={
        "response_type": "markdown",  # or "plaintext", "pdf"
    },
)

Original Status (`original_status`)

Get the original HTTP status code from the target page:

yield ZenRowsRequest(
    url="https://example.com",
    params={
        "original_status": "true",
    },
)

Allowed Status Codes (`allowed_status_codes`)

Get content even from error pages:

yield ZenRowsRequest(
    url="https://example.com/may-404",
    params={
        "allowed_status_codes": "404,500,503",
    },
)

Parameter Reference

Parameter	Type	Description
`js_render`	bool	Enable JavaScript rendering
`premium_proxy`	bool	Use residential IPs
`proxy_country`	str	Country code (us, de, jp)
`proxy_city`	str	City name (requires premium_proxy)
`session_id`	int	Maintain IP (1-99999, 10 min)
`wait`	int	Wait milliseconds after load
`wait_for`	str	Wait for CSS selector
`block_resources`	str	Block resources (image,media,font)
`js_instructions`	str	JSON array of JS actions
`autoparse`	bool	Auto-extract structured data
`css_extractor`	str	CSS selectors in JSON
`outputs`	str	Data types: tables,emails
`response_type`	str	markdown, plaintext, pdf
`json_response`	bool	Capture XHR/Fetch as JSON
`screenshot`	bool	Capture screenshot
`screenshot_fullpage`	bool	Full page screenshot
`screenshot_selector`	str	Screenshot specific element
`screenshot_format`	str	png or jpeg
`screenshot_quality`	int	JPEG quality 1-100
`original_status`	bool	Return target's HTTP status
`allowed_status_codes`	str	Get content from error pages
`custom_headers`	bool	Enable custom headers

Usage Examples

Here are example spider demonstrating how to use the scrapy_zenrows middleware:

antibot_bypass_spider: Demonstrates the basic usage of the ZenRows Scraper API for bypassing anti-bots.
concurrent_ecommerce_spider: Scraping concurrently with Scrapy while using ZenRowsRequest.
custom_headers_spider: Shows how to specify custom headers, including Cookies while using ZenRowsRequest.
pagination_spider: Shows how to implement pagination in Scrapy while using ZenRowsRequest.
screenshot_spider: Demonstrates how to add screenshot capability to Scrapy with the ZenRows Scraper API.
table_parsing_spider: Examples showing how to parse a table using the outputs feature of the ZenRows Scraper API. Check the supported outputs for more information.

New in v1.1.0

session_spider: Multi-step session-based scraping with IP persistence using session_id.
dynamic_content_spider: Scraping SPAs and dynamic content using wait_for.
optimized_spider: High-performance scraping with block_resources.
ai_pipeline_spider: Extracting content for AI/LLM pipelines using response_type.

Examples directory: examples/

👉🏼 Official scrapy-zenrows integration documentation

👉🏼 Full Feature Roadmap

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.1.2

Jan 1, 2026

1.1.1

Jan 1, 2026

1.1.0

Jan 1, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy_zenrows_middleware-1.1.2.tar.gz (21.2 kB view details)

Uploaded Jan 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

scrapy_zenrows_middleware-1.1.2-py3-none-any.whl (17.9 kB view details)

Uploaded Jan 1, 2026 Python 3

File details

Details for the file scrapy_zenrows_middleware-1.1.2.tar.gz.

File metadata

Download URL: scrapy_zenrows_middleware-1.1.2.tar.gz
Upload date: Jan 1, 2026
Size: 21.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for scrapy_zenrows_middleware-1.1.2.tar.gz
Algorithm	Hash digest
SHA256	`4bb44a9b71b25e669e694d101da20a1611e643f4443b6ec5912d26f3b24a3d68`
MD5	`84f178804616f334e5477ba8e6072713`
BLAKE2b-256	`a44f14bd38bd4af77938db92d34fc11624427a9b655bedc12e38c8831fa36a20`

See more details on using hashes here.

File details

Details for the file scrapy_zenrows_middleware-1.1.2-py3-none-any.whl.

File metadata

Download URL: scrapy_zenrows_middleware-1.1.2-py3-none-any.whl
Upload date: Jan 1, 2026
Size: 17.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for scrapy_zenrows_middleware-1.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0b02fca601a0741dd873e22e3b7f005b85c0208eaf08569cb620b0e44d170660`
MD5	`98706dc449ab5c8a199507192326b3b0`
BLAKE2b-256	`90cb504aa3e7fb4ced3947ac08219780917f4fe358d56c9a20b326e4300982ff`

See more details on using hashes here.

scrapy-zenrows-middleware 1.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Scrapy_ZenRows Middleware

Introduction

Table of Contents

Installation

Usage

Global Settings

Set Global Premium Proxy and JS Rendering

Override Global Settings for Specific Requests

Other Request Parameters

Using Custom Headers

Adding the Cookies Header

New in v1.1.0

Retry Middleware

New Parameters

Session Persistence (session_id)

Wait for Element (wait_for)

Block Resources (block_resources)

Response Type (response_type)

Original Status (original_status)

Allowed Status Codes (allowed_status_codes)

Parameter Reference

Usage Examples

New in v1.1.0

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Session Persistence (`session_id`)

Wait for Element (`wait_for`)

Block Resources (`block_resources`)

Response Type (`response_type`)

Original Status (`original_status`)

Allowed Status Codes (`allowed_status_codes`)