Skip to main content

ruia_pyppeteer - A Ruia plugin for loading javascript - pyppeteer.

Project description

ruia-pyppeteer

A Ruia plugin for loading javascript

Notice: Works on ruia >= 0.8.0

Installation

pip install ruia_pyppeteer
# New features
pip install git+https://github.com/ruia-plugins/ruia-pyppeteer

Usage

ruia_pyppeteer will load js by using pyppeteer.

You need to pay attention when you use load_js, it will download a recent version of Chromium (~100MB). This only happens once.

Load JavaScript

import asyncio

from ruia_pyppeteer import PyppeteerRequest as Request

request = Request("https://www.jianshu.com/", load_js=True)
response = asyncio.get_event_loop().run_until_complete(request.fetch())
print(response)

Complete example

from ruia import AttrField, Item, TextField

from ruia_pyppeteer import PyppeteerSpider as Spider


class JianshuItem(Item):
    target_item = TextField(css_select="ul.list>li")
    author_name = TextField(css_select="a.name")
    author_url = AttrField(attr="href", css_select="a.name")

    async def clean_author_name(self, author_name):
        return author_name.strip()

    async def clean_author_url(self, author_url):
        return f"https://www.jianshu.com{author_url}"


class JianshuSpider(Spider):
    start_urls = ["https://www.jianshu.com/"]
    concurrency = 10

    async def parse(self, response):
        html = await response.page.content()
        async for item in JianshuItem.get_items(html=html):
            # Loading js by using PyppeteerRequest
            print(item)
        await response.browser.close()


if __name__ == "__main__":
    JianshuSpider.start()

Enjoy it :)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ruia_pyppeteer-0.0.8.tar.gz (4.1 kB view details)

Uploaded Source

File details

Details for the file ruia_pyppeteer-0.0.8.tar.gz.

File metadata

  • Download URL: ruia_pyppeteer-0.0.8.tar.gz
  • Upload date:
  • Size: 4.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.64.0 importlib-metadata/4.8.3 keyring/23.4.1 rfc3986/1.5.0 colorama/0.4.4 CPython/3.6.10

File hashes

Hashes for ruia_pyppeteer-0.0.8.tar.gz
Algorithm Hash digest
SHA256 da5bde9c557b157a55edb577dee9a17b46b298ae94aee37a88b11496b11b5c28
MD5 b75ec8bed7781df9fce0543a174d042e
BLAKE2b-256 156c884ade1c3125ac77d4a7e8d0c2abb64e0de40127b9882955fa92f9fa989f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page