Skip to main content

scrapy的一个下载中间件,无缝对接pyppeteer

Project description

This is a package for supporting pyppeteer in Scrapy, also this package is a module in GerapyPyppeteer

在原来基础上增加 page.click,以及代理的用户名和密码验证

def start_requests(self):
    for page in range(1, 2):
        yield PyppeteerRequest(self.base_url, callback=self.parse_index, dont_filter=True,
                               wait_for=".vjs-poster",
                               click="xpath",
                               proxy="http://username:password@ip:prot")
def start_requests(self):
    yield PyppeteerRequest(
        url="https://www.d2pass.com/search?k=%E7%B9%B0%E3%82%8A%E8%BF%94%E3%81%97%E6%BF%83%E5%8E%9A%E3%81%AA%E3%81%AE%E3%82%92%E6%AC%B2%E3%81%97%E3%81%A6%E3%82%84%E3%81%BE%E3%81%AA%E3%81%84%E7%BE%8E%E3%83%9C%E3%83%87%E3%82%A3%E3%83%95%E3%83%BC%E3%83%89%E3%83%AB",
        wait_for=".gridimg", click=(
            "//*[@id='portfolio']/li/div/p[5]/a"), wait_for_next="#review-section",
        callback=self.parse)

ScrapyPyppeteer

scrapy的一个下载中间件,无缝对接yppeteer

handle await错误提示

在setting增加

AROAY_ENABLE_REQUEST_INTERCEPTION = False

ScrapyPyppeteer

scrapy的一个下载中间件,无缝对接yppeteer

安装

pip3 install daoke-pyppeteer

DOWNLOADER_MIDDLEWARES = {
    'aroay_pyppeteer.downloadermiddlewares.PyppeteerMiddleware': 543,
}

配置

CONCURRENT_REQUESTS = 3
AROAY_PYPPETEER_PRETEND = False  # 默认为True,某些网站能检测无头或者webdriver驱动,需要开启
AROAY_PYPPETEER_HEADLESS = False  # 默认为True
AROAY_PYPPETEER_DOWNLOAD_TIMEOUT = 30  # 默认渲染页面超时时间30s

拦截请求
AROAY_PYPPETEER_IGNORE_RESOURCE_TYPES = ['stylesheet', 'script']

所有可选资源类型列表:

  • document: the Original HTML document
  • stylesheet: CSS files
  • script: JavaScript files
  • image: Images
  • media: Media files such as audios or videos
  • font: Fonts files
  • texttrack: Text Track files
  • xhr: Ajax Requests
  • fetch: Fetch Requests
  • eventsource: Event Source
  • websocket: Websocket
  • manifest: Manifest files
  • other: Other files

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aroay_pyppeteer-1.4.tar.gz (24.6 kB view details)

Uploaded Source

Built Distribution

aroay_pyppeteer-1.4-py3-none-any.whl (25.0 kB view details)

Uploaded Python 3

File details

Details for the file aroay_pyppeteer-1.4.tar.gz.

File metadata

  • Download URL: aroay_pyppeteer-1.4.tar.gz
  • Upload date:
  • Size: 24.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.4

File hashes

Hashes for aroay_pyppeteer-1.4.tar.gz
Algorithm Hash digest
SHA256 761eff4bf1da9b2443abde54733f861996f12dd415396321f5c98b2887b2022d
MD5 c26a400a4a7d2c1e4e99c76237c6c870
BLAKE2b-256 926a38d9711ab20612bae9c1238905292eedf0027362c0a83661949e5178e575

See more details on using hashes here.

File details

Details for the file aroay_pyppeteer-1.4-py3-none-any.whl.

File metadata

  • Download URL: aroay_pyppeteer-1.4-py3-none-any.whl
  • Upload date:
  • Size: 25.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.4

File hashes

Hashes for aroay_pyppeteer-1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 312cb3ad51bc4aa3a1c8812b5140808513973804946242c251d10cbfd763ad37
MD5 89b571798146e9b302611468d5f14954
BLAKE2b-256 8456f72dbd0164d6f1fc8224159470b714a43bfffc942e2bee54f904f67f94b8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page