scrapy的一个下载中间件,无缝对接pyppeteer
Project description
This is a package for supporting pyppeteer in Scrapy, also this package is a module in GerapyPyppeteer
在原来基础上增加 page.click,以及代理的用户名和密码验证
def start_requests(self):
for page in range(1, 2):
yield PyppeteerRequest(self.base_url, callback=self.parse_index, dont_filter=True,
wait_for=".vjs-poster",
click="xpath",
proxy="http://username:password@ip:prot")
def start_requests(self):
yield PyppeteerRequest(
url="https://www.d2pass.com/search?k=%E7%B9%B0%E3%82%8A%E8%BF%94%E3%81%97%E6%BF%83%E5%8E%9A%E3%81%AA%E3%81%AE%E3%82%92%E6%AC%B2%E3%81%97%E3%81%A6%E3%82%84%E3%81%BE%E3%81%AA%E3%81%84%E7%BE%8E%E3%83%9C%E3%83%87%E3%82%A3%E3%83%95%E3%83%BC%E3%83%89%E3%83%AB",
wait_for=".gridimg", click=(
"//*[@id='portfolio']/li/div/p[5]/a"), wait_for_next="#review-section",
callback=self.parse)
ScrapyPyppeteer
scrapy的一个下载中间件,无缝对接yppeteer
handle await错误提示
在setting增加
AROAY_ENABLE_REQUEST_INTERCEPTION = False
ScrapyPyppeteer
scrapy的一个下载中间件,无缝对接yppeteer
安装
pip3 install daoke-pyppeteer
DOWNLOADER_MIDDLEWARES = {
'aroay_pyppeteer.downloadermiddlewares.PyppeteerMiddleware': 543,
}
配置
CONCURRENT_REQUESTS = 3
AROAY_PYPPETEER_PRETEND = False # 默认为True,某些网站能检测无头或者webdriver驱动,需要开启
AROAY_PYPPETEER_HEADLESS = False # 默认为True
AROAY_PYPPETEER_DOWNLOAD_TIMEOUT = 30 # 默认渲染页面超时时间30s
拦截请求
AROAY_PYPPETEER_IGNORE_RESOURCE_TYPES = ['stylesheet', 'script']
所有可选资源类型列表:
- document: the Original HTML document
- stylesheet: CSS files
- script: JavaScript files
- image: Images
- media: Media files such as audios or videos
- font: Fonts files
- texttrack: Text Track files
- xhr: Ajax Requests
- fetch: Fetch Requests
- eventsource: Event Source
- websocket: Websocket
- manifest: Manifest files
- other: Other files
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
aroay_pyppeteer-1.4.tar.gz
(24.6 kB
view details)
Built Distribution
File details
Details for the file aroay_pyppeteer-1.4.tar.gz
.
File metadata
- Download URL: aroay_pyppeteer-1.4.tar.gz
- Upload date:
- Size: 24.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 761eff4bf1da9b2443abde54733f861996f12dd415396321f5c98b2887b2022d |
|
MD5 | c26a400a4a7d2c1e4e99c76237c6c870 |
|
BLAKE2b-256 | 926a38d9711ab20612bae9c1238905292eedf0027362c0a83661949e5178e575 |
File details
Details for the file aroay_pyppeteer-1.4-py3-none-any.whl
.
File metadata
- Download URL: aroay_pyppeteer-1.4-py3-none-any.whl
- Upload date:
- Size: 25.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 312cb3ad51bc4aa3a1c8812b5140808513973804946242c251d10cbfd763ad37 |
|
MD5 | 89b571798146e9b302611468d5f14954 |
|
BLAKE2b-256 | 8456f72dbd0164d6f1fc8224159470b714a43bfffc942e2bee54f904f67f94b8 |