Skip to main content

toy for chrome devtools protocol. Read more: https://github.com/ClericPy/ichrome.

Project description

ichrome - v0.0.9

A toy for using chrome under the Chrome Devtools Protocol(CDP). For python3.6+ (who cares python2.x).

Install

pip install ichrome -U

Why?

  • pyppeteer/selenium is awesome, but I don't need so much...

  • one way to test CDP

Features

  • Chrome process daemon
  • Connect to existing chrome debug port
  • Operations on Tabs

Examples

Chrome daemon

from ichrome import ChromeDaemon

def main():
    with ChromeDaemon() as chromed:
        # run_forever means auto_restart
        chromed.run_forever(0)
        chrome = Chrome()
        tab = chrome.new_tab()
        time.sleep(3)
        tab.close()

if __name__ == "__main__":
    main()

Connect to existing debug port

from ichrome import Chrome

def main():
    chrome = Chrome()
    print(chrome.tabs)
    # [ChromeTab("6EC65C9051697342082642D6615ECDC0", "about:blank", "about:blank", port: 9222)]
    print(chrome.tabs[0])
    # Tab(about:blank)

if __name__ == "__main__":
    main()

Operations on Tab

from ichrome import Chrome

import time


def main():
    chrome = Chrome()
    print(chrome.tabs)
    # [ChromeTab("6EC65C9051697342082642D6615ECDC0", "about:blank", "about:blank", port: 9222)]
    tab = chrome.tabs[0]
    # open a new page
    print(tab.set_url("http://p.3.cn/1", timeout=3))  # {"id":4,"result":{}}
    # reload page
    print(tab.reload())  # {"id":4,"result":{}}
    # Not recommended new_tab with url, use set_url can set a timeout to stop loading
    # tab = chrome.new_tab()
    # tab.set_url("http://p.3.cn", timeout=3)
    tab = chrome.new_tab("http://p.3.cn/new")
    time.sleep(1)
    print("404 Not Found" in tab.get_html("u8"))  # True
    print(tab.current_url)  # http://p.3.cn/new
    tab.close()


if __name__ == "__main__":
    main()

Advanced Usage (Crawling a special background request.)

"""
Test normal usage of ichrome.

1. use `with` context for launching ChromeDaemon daemon process.
2. init Chrome for connecting with chrome background server.
3. Tab ops:
  3.1 create a new tab
  3.2 goto new url with tab.set_url, and will stop load for timeout.
  3.3 get cookies from url
  3.4 inject the jQuery lib by a static url.
  3.5 Network crawling from the background ajax request.
  3.6 click some element by tab.click with css selector.
  3.7 use querySelectorAll to get the elements.
"""


def example():
    import sys
    import os

    # use local ichrome module
    sys.path.insert(0, os.path.dirname(os.path.dirname(__file__)))
    os.chdir("..")  # for reuse exiting user data dir
    from ichrome import Chrome, Tab, ChromeDaemon, ichrome_logger as logger
    import re
    import json
    import time

    """Example for crawling a special background request."""

    # reset default logger level, such as DEBUG
    # import logging
    # logger.setLevel(logging.INFO)
    # launch the Chrome process and daemon process, will auto shutdown by 'with' expression.
    with ChromeDaemon(host="127.0.0.1", port=9222) as chromed:
        # create connection to Chrome Devtools
        chrome = Chrome(host="127.0.0.1", port=9222, timeout=3, retry=1)
        # now create a new tab without url
        tab = chrome.new_tab()
        # reset the url to bing.com, if loading time more than 5 seconds, will stop loading.
        # if inject js success, will alert Vue
        tab.set_url(
            "https://www.bing.com/", referrer="https://www.github.com/", timeout=5
        )
        # get_cookies from url
        logger.info(tab.get_cookies("http://cn.bing.com"))
        # test inject_js, if success, will alert jQuery version info 3.3.1
        logger.info(
            tab.inject_js("https://cdn.staticfile.org/jquery/3.3.1/jquery.min.js")
        )
        logger.info(
            tab.js("alert('jQuery inject success:' + jQuery.fn.jquery)")
        )
        tab.js('alert("Now input `test` to the input position.")')
        # automate press accept for alert~
        tab.send("Page.handleJavaScriptDialog", accept=True)
        # enable the Network function, otherwise will not recv Network request/response.
        logger.info(tab.send("Network.enable"))
        # here will block until input string "test" in the input position.
        # tab is waiting for the event Network.responseReceived which accord with the given filter_function.
        recv_string = tab.wait_event(
            "Network.responseReceived",
            filter_function=lambda r: re.search("&\w+=test", r or ""),
            wait_seconds=None,
        )
        # now catching the "Network.responseReceived" event string, load the json.
        recv_string = json.loads(recv_string)
        # get the requestId to fetch its response body.
        request_id = recv_string["params"]["requestId"]
        logger.info("requestId: %s" % request_id)
        # send request for getResponseBody
        resp = tab.send("Network.getResponseBody", requestId=request_id, timeout=5)
        # now resp is the response body result.
        logger.info("getResponseBody success %s" % resp)
        # directly click the button matched the cssselector #sb_form_go, here is the submit button.
        logger.info(tab.click("#sb_form_go"))
        # use querySelectorAll to get the elements.
        for i in tab.querySelectorAll("#sc_hdu>li"):
            logger.info(i, i.get("id"), i.text)
        chromed.run_forever()


if __name__ == "__main__":
    example()

Command Line Usage

λ python3 -m ichrome -s 9222
2018-11-27 23:01:59 DEBUG [ichrome] _base.py(329): kill chrome.exe --remote-debugging-port=9222
2018-11-27 23:02:00 DEBUG [ichrome] _base.py(329): kill chrome.exe --remote-debugging-port=9222

λ python3 -m ichrome -p 9222 --start_url "http://bing.com" --disable_image
2018-11-27 23:03:57 INFO  [ichrome] __main__.py(69): ChromeDaemon cmd args: {'daemon': True, 'block': True, 'chrome_path': '', 'host': 'localhost', 'port': 9222, 'headless': False, 'user_agent': '', 'proxy': '', 'user_data_dir': None, 'disable_image': True, 'start_url': 'http://bing.com', 'extra_config': '', 'max_deaths': 2, 'timeout': 2}

TODO

  • Concurrent support. (gevent, threading)

  • Add auto_restart while crash.

  • Auto remove the zombie tabs with a lifebook.

  • Add some useful examples.

  • Coroutine support (for asyncio).

Project details


Release history Release notifications | RSS feed

This version

0.0.9

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

ichrome-0.0.9-py3-none-any.whl (14.1 kB view details)

Uploaded Python 3

File details

Details for the file ichrome-0.0.9-py3-none-any.whl.

File metadata

  • Download URL: ichrome-0.0.9-py3-none-any.whl
  • Upload date:
  • Size: 14.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.1 setuptools/39.0.1 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.7.1

File hashes

Hashes for ichrome-0.0.9-py3-none-any.whl
Algorithm Hash digest
SHA256 1a7ef48de241c9659da131a2659fbc3f599ce29b16fcfa0478f602209189bc22
MD5 264a68747ca73f493e3ce8f04e1cee0a
BLAKE2b-256 157e95451a4039659ff22cde989575b4e7dadaa3927e8870c3796e615ab4b34c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page