scrapy的一个下载中间件,无缝对接pyppeteer
Project description
This is a package for supporting pyppeteer in Scrapy, also this package is a module in GerapyPyppeteer
在原来基础上增加 page.click,以及代理的用户名和密码验证
def start_requests(self):
for page in range(1, 2):
yield PyppeteerRequest(self.base_url, callback=self.parse_index, dont_filter=True,
wait_for=".vjs-poster",
click=".vjs-big-play-button",
proxy="http://username:password@ip:prot")
ScrapyPyppeteer
scrapy的一个下载中间件,无缝对接yppeteer
handle await错误提示
在setting增加
AROAY_ENABLE_REQUEST_INTERCEPTION = False
ScrapyPyppeteer
scrapy的一个下载中间件,无缝对接yppeteer
安装
pip3 install daoke-pyppeteer
DOWNLOADER_MIDDLEWARES = {
'aroay_pyppeteer.downloadermiddlewares.PyppeteerMiddleware': 543,
}
配置
CONCURRENT_REQUESTS = 3
AROAY_PYPPETEER_PRETEND = False #默认为True,某些网站能检测无头或者webdriver驱动,需要开启
AROAY_PYPPETEER_HEADLESS = False #默认为True
AROAY_PYPPETEER_DOWNLOAD_TIMEOUT = 30 #默认渲染页面超时时间30s
拦截请求
AROAY_PYPPETEER_IGNORE_RESOURCE_TYPES = ['stylesheet', 'script']
所有可选资源类型列表:
- document: the Original HTML document
- stylesheet: CSS files
- script: JavaScript files
- image: Images
- media: Media files such as audios or videos
- font: Fonts files
- texttrack: Text Track files
- xhr: Ajax Requests
- fetch: Fetch Requests
- eventsource: Event Source
- websocket: Websocket
- manifest: Manifest files
- other: Other files
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
aroay_pyppeteer-1.2.tar.gz
(23.8 kB
view hashes)
Built Distribution
Close
Hashes for aroay_pyppeteer-1.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a78e0396f52e18acd1d07a32c1dc21a19bfdc35aadcc74927641227ea929f8ff |
|
MD5 | ac6a93c38cf0ad47d2cf5e932c5b77b6 |
|
BLAKE2b-256 | 59764bc76f2d9707ccec7c86ebd6d13524b22f9699dc73d1c1638ab900547128 |