Skip to main content

Scrapy with puppeteer

Project description

# Scrapy with Puppeteer
[![PyPI](]( [![Build Status](]( [![Test Coverage](]( [![Maintainability](](

Scrapy middleware to handle javascript pages using [puppeteer](


This is an attempt to make Scrapy and Puppeteer work together to handle Javascript-rendered pages.
The design is strongly inspired of the Scrapy [Splash plugin](

**Scrapy and Puppeteer**

The main issue when running Scrapy and Puppeteer together is that Scrapy is using [Twisted]( and that [Pyppeteeer]( (the python port of puppeteer we are using) is using [asyncio]( for async stuff.

Luckily, we can use the Twisted's [asyncio reactor]( to make the two talking with each other.

That's why you **cannot** use the buit-in `scrapy` command line (installing the default reactor), you will have to use the `scrapyp` one, provided by this module.

If you are running your spiders from a script, you will have to make sure you install the asyncio reactor before importing scrapy or doing anything else:

import asyncio
from twisted.internet import asyncioreactor


## Installation
$ pip install scrapy-puppeteer

## Configuration
Add the `PuppeteerMiddleware` to the downloader middlewares:
'scrapy_puppeteer.PuppeteerMiddleware': 800

## Usage
Use the `scrapy_puppeteer.PuppeteerRequest` instead of the Scrapy built-in `Request` like below:
from scrapy_puppeteer import PuppeteerRequest

def your_parse_method(self, response):
# Your code...
yield PuppeteerRequest('', self.parse_result)
The request will be then handled by puppeteer.

The `selector` response attribute work as usual (but contains the html processed by puppeteer).

def parse_result(self, response):

### Additional arguments
The `scrapy_puppeteer.PuppeteerRequest` accept 2 additional arguments:

#### `wait_until`

Will be passed to the [`waitUntil`]( parameter of puppeteer.
Default to `domcontentloaded`.

#### `wait_for`
Will be passed to the [`waitFor`]( to puppeteer.

#### `screenshot`
When used, puppeteer will take a [screenshot]( of the page and the binary data of the .png captured will be added to the response `meta`:
yield PuppeteerRequest(

def parse_result(self, response):
with open('image.png', 'wb') as image_file:

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for scrapy-puppeteer, version 0.0.1b0
Filename, size File type Python version Upload date Hashes
Filename, size scrapy_puppeteer-0.0.1b0-py3-none-any.whl (6.5 kB) File type Wheel Python version py3 Upload date Hashes View
Filename, size scrapy-puppeteer-0.0.1b0.tar.gz (5.1 kB) File type Source Python version None Upload date Hashes View

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring DigiCert DigiCert EV certificate Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page