Skip to main content

asyncio driver + HTTP server for Chrome devtools protocol

Project description

Build Status Docker Hub Status PyPi version

Chrome browser as an HTTP service with an splash compatible HTTP API

Chromewhip is an easily deployable service that runs headless Chrome process wrapped with an HTTP API. Inspired by the `splash <https://github.com/scrapinghub/splash>`__ project, we aim to provide a drop-in replacement for the splash service by adhering to their documented API.

It is currently in early alpha and still being heavily developed. Please use the issue tracker to track the progress towards beta. For now, the required milestone can be summarised as implementing the entire Splash API.

Python 3.6 asyncio driver for Chrome devtools protocol

Chromewhip communicates with the Chrome process with our own asyncio driver.

  • Typed Python bindings for devtools protocol through templated generation - get autocomplete with your code editor.

  • Can bind events to concurrent commands, which is required for providing a robust HTTP service.

Some example code on how to use it:

import asyncio
import logging

from chromewhip import Chrome
from chromewhip.protocol import page, dom

# see logging from chromewhip
logging.basicConfig(level=logging.DEBUG)

HOST = '127.0.0.1'
PORT = 9222

loop = asyncio.get_event_loop()
c = Chrome(host=HOST, port=PORT)

loop.run_until_complete(c.connect())

tab = c.tabs[0]

loop.run_until_complete(tab.enable_page_events())

cmd = page.Page.navigate(url='http://nzherald.co.nz')

# send_command will return once the frameStoppedLoading event is received THAT matches
# the frameId that it is in the returned command payload.
await_on_event_type = page.FrameStoppedLoadingEvent

result = loop.run_until_complete(tab.send_command(cmd, await_on_event_type=await_on_event_type))

# send_command always returns a dict with keys `ack` and `event`
# `ack` contains the payload on response of a command
# `event` contains the payload of the awaited event if `await_on_event_type` is provided
ack = result['ack']['result']
event = result['event']
assert ack['frameId'] == event.frameId

cmd = page.Page.setDeviceMetricsOverride(width=800,
                                         height=600,
                                         deviceScaleFactor=0.0,
                                         mobile=False,
                                         fitWindow=False)

loop.run_until_complete(tab.send_command(cmd))

result = loop.run_until_complete(tab.send_command(dom.DOM.getDocument()))

dom_obj = result['ack']['result']['root']

# Python types are determined by the `types` fields in the JSON reference for the
# devtools protocol, and `send_command` will convert if possible.
assert isinstance(dom_obj, dom.Node)

print(dom_obj.nodeId)
print(dom_obj.nodeName)

Running

Deploying with Docker

docker run --init -it --rm --shm-size=1024m -p=127.0.0.1:8080:8080 --cap-add=SYS_ADMIN \
  chuckus/chromewhip

Requirements for MacOS 10.12+

  • Google Chrome Canary

Implemented HTTP API

/render.html

Query params:

  • url : string : required

  • The url to render (required)

  • js : string : optional Javascript profile name.

  • js_source : string : optional

  • JavaScript code to be executed in page context

  • viewport : string : optional

  • View width and height (in pixels) of the browser viewport to render the web page. Format is “x”, e.g. 800x600. Default value is 1024x768.

    ‘viewport’ parameter is more important for PNG and JPEG rendering; it is supported for all rendering endpoints because javascript code execution can depend on viewport size.

/render.png

Query params (including render.html):

  • render_all : int : optional

  • Possible values are 1 and 0. When render_all=1, extend the viewport to include the whole webpage (possibly very tall) before rendering.

Why not just use Selenium?

  • chromewhip uses the devtools protocol instead of the json wire protocol, where the devtools protocol has greater flexibility, especially when it comes to subscribing to granular events from the browser.

Bug reports and requests

Please simply file one using the Github tracker

Contributing

Please :)

Implementation

Developed to run on Python 3.6, it leverages both aiohttp and asyncio for the implementation of the asynchronous HTTP server that wraps chrome.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chromewhip-0.2.2.tar.gz (77.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

chromewhip-0.2.2-py3-none-any.whl (107.8 kB view details)

Uploaded Python 3

File details

Details for the file chromewhip-0.2.2.tar.gz.

File metadata

  • Download URL: chromewhip-0.2.2.tar.gz
  • Upload date:
  • Size: 77.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for chromewhip-0.2.2.tar.gz
Algorithm Hash digest
SHA256 56446958b75cc6351d20da5df912b1e444a13c7e0184088d5e3312283e067ef7
MD5 615fcb0a1410527ab711366f68110e1e
BLAKE2b-256 91f6e6959cdb808b3fa0bdcfa42f26e4466c9ba213c80395fc635837c5a7cc90

See more details on using hashes here.

File details

Details for the file chromewhip-0.2.2-py3-none-any.whl.

File metadata

File hashes

Hashes for chromewhip-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 0b52994c1359027850eae28470a33d5b33cd88c08947b862479beb2135bb7e33
MD5 b3c18594eb3716e84e5e4a21b2b55149
BLAKE2b-256 77b3307d20431db22636cc5fd42660bd024340bdbb434c291e7fbff317e4d81e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page