Chrome controller for Humans, base on Chrome Devtools Protocol(CDP) and python3.7+. Read more: https://github.com/ClericPy/ichrome.
Project description
ichrome
Chrome controller for Humans, base on Chrome Devtools Protocol(CDP) and python3.7+.
Why?
- Pyppeteer is awesome, but I don't need so much
- spelling of pyppeteer is confused
- event-driven programming is not always advisable.
- Selenium is slow
- webdrivers often come with memory leak.
- In desperate need of a stable toolkit to communicate with Chrome browser
- fast http & websocket connections (based on aiohttp) for asyncio environment
- ichrome.debugger is a sync tool and depends on the
ichrome.async_utils
- a choice for debugging interactively.
Features
- Chrome process daemon
- auto-restart
- command-line usage support
- async environment compatible
- Connect to an existing Chrome
- Operations on Tabs under stable websocket
- Package very commonly used functions
Structure Chart
Install
Install from PyPI
pip install ichrome -U
Uninstall & Clear the user data dir
$ python3 -m ichrome --clean
$ pip uninstall ichrome
AsyncChrome feature list
-
server
return
f"http://{self.host}:{self.port}"
, such ashttp://127.0.0.1:9222
-
version
version info from
/json/version
format like:{'Browser': 'Chrome/77.0.3865.90', 'Protocol-Version': '1.3', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.90 Safari/537.36', 'V8-Version': '7.7.299.11', 'WebKit-Version': '537.36 (@58c425ba843df2918d9d4b409331972646c393dd)', 'webSocketDebuggerUrl': 'ws://127.0.0.1:9222/devtools/browser/b5fbd149-959b-4603-b209-cfd26d66bdc1'}
-
connect
/check
/ok
check alive
-
get_tabs
/tabs
/get_tab
/get_tabs
get the
AsyncTab
instance from/json
. -
new_tab
/activate_tab
/close_tab
/close_tabs
operating tabs.
-
close_browser
find the activated tab and send
Browser.close
message, close the connected chrome browser gracefully.await chrome.close_browser()
-
kill
force kill the chrome process with self.port.
await chrome.kill()
-
connect_tabs
connect websockets for multiple tabs in one
with
context, and disconnect before exiting.tab0: AsyncTab = (await chrome.tabs)[0] tab1: AsyncTab = await chrome.new_tab() async with chrome.connect_tabs([tab0, tab1]): assert (await tab0.current_url) == 'about:blank' assert (await tab1.current_url) == 'about:blank'
AsyncTab feature list
-
set_url
/reload
navigate to a new url.
reload
equals toset_url(None)
-
wait_event
listening the events with given name, and separate from other same-name events with filter_function, finally run the callback_function with result.
-
wait_page_loading
/wait_loading
wait for
Page.loadEventFired
event, or stop loading while timeout. Different fromwait_loading_finished
. -
wait_response
/wait_request
filt the
Network.responseReceived
/Network.requestWillBeSent
event byfilter_function
, return therequest_dict
which can be used byget_response
/get_response_body
/get_request_post_data
. WARNING: requestWillBeSent event fired do not mean the response is ready, should await tab.wait_request_loading(request_dict) or await tab.get_response(request_dict, wait_loading=True) -
wait_request_loading
/wait_loading_finished
sometimes event got
request_dict
withwait_response
, but the ajax request is still fetching, which need to wait theNetwork.loadingFinished
event. -
activate
/activate_tab
activate tab with websocket / http message.
-
close
/close_tab
close tab with websocket / http message.
-
add_js_onload
Page.addScriptToEvaluateOnNewDocument
, which means this javascript code will be run before page loaded. -
clear_browser_cache
/clear_browser_cookies
Network.clearBrowserCache
andNetwork.clearBrowserCookies
-
querySelectorAll
get the tag instance, which contains the
tagName, innerHTML, outerHTML, textContent, attributes
attrs. -
click
click the element queried by given css selector.
-
refresh_tab_info
to refresh the init attrs:
url
,title
. -
current_html
/current_title
/current_url
get the current html / title / url with
tab.js
. or using therefresh_tab_info
method and init attrs. -
crash
Page.crash
-
get_cookies
/get_all_cookies
/delete_cookies
/set_cookie
some page cookies operations.
-
set_headers
/set_ua
Network.setExtraHTTPHeaders
andNetwork.setUserAgentOverride
, used to update headers dynamically. -
close_browser
send
Browser.close
message to close the chrome browser gracefully. -
get_bounding_client_rect
/get_element_clip
get_element_clip
is alias name for the other, these two method is to get the rect of element which queried by css element. -
screenshot
/screenshot_element
get the screenshot base64 encoded image data.
screenshot_element
should be given a css selector to locate the element. -
get_page_size
/get_screen_size
size of current window or the whole screen.
-
get_response
get the response body with the given request dict.
-
js
run the given js code, return the raw response from sending
Runtime.evaluate
message. -
inject_js_url
inject some js url, like
<script src="xxx/static/js/jquery.min.js"></script>
do. -
get_value
&get_variable
run the given js variable or expression, and return the result.
await tab.get_value('document.title') await tab.get_value("document.querySelector('title').innerText")
-
keyboard_send
dispath key event with
Input.dispatchKeyEvent
-
mouse_click
dispath click event on given position
-
mouse_drag
dispath drag event on given position, and return the target x, y.
duration
arg is to slow down the move speed. -
mouse_drag_rel
dispath drag event on given offset, and return the target x, y.
-
mouse_drag_rel
drag with offsets continuously.
await tab.set_url('https://draw.yunser.com/') walker = await tab.mouse_drag_rel_chain(320, 145).move(50, 0, 0.2).move( 0, 50, 0.2).move(-50, 0, 0.2).move(0, -50, 0.2) await walker.move(50 * 1.414, 50 * 1.414, 0.2)
-
mouse_press
/mouse_release
/mouse_move
/mouse_move_rel
/mouse_move_rel_chain
similar to the drag features. These mouse features is only dispatched events, not the real mouse action.
-
history_back
/history_forward
/goto_history_relative
/reset_history
back / forward history
Examples
See the Classic Use Cases
Quick Start
-
Start a new chrome daemon process with headless=False
python -m ichrome
or launch chrome daemon in code
async with AsyncChromeDaemon():
-
Create the connection to exist chrome browser
async with AsyncChrome() as chrome:
-
Operations on the tabs: new tab, wait loading, run javascript, get html, close tab
-
Close the browser GRACEFULLY instead of killing process
from ichrome import AsyncChromeDaemon, AsyncChrome
import asyncio
async def main():
# If there is an existing daemon, such as `python -m ichrome`, the `async with AsyncChromeDaemon` context can be omitted.
async with AsyncChromeDaemon():
# connect to an opened chrome
async with AsyncChrome() as chrome:
tab = await chrome.new_tab(url="https://github.com/ClericPy")
# async with tab() as tab:
# and `as tab` can be omitted
async with tab():
await tab.wait_loading(2)
await tab.js("document.write('<h1>Document updated.</h1>')")
await asyncio.sleep(1)
# await tab.js('alert("test ok")')
print('output:', await tab.html)
# output: <html><head></head><body><h1>Document updated.</h1></body></html>
await tab.close()
# close_browser gracefully, I have no more need of chrome instance
await chrome.close_browser()
if __name__ == "__main__":
asyncio.run(main())
Command Line Usage
Be used for launching a chrome daemon process. The unhandled args will be treated as chrome raw args and appended to extra_config list.
Shutdown Chrome process with the given port
λ python3 -m ichrome -s 9222
2018-11-27 23:01:59 DEBUG [ichrome] base.py(329): kill chrome.exe --remote-debugging-port=9222
2018-11-27 23:02:00 DEBUG [ichrome] base.py(329): kill chrome.exe --remote-debugging-port=9222
Launch a Chrome daemon process
λ python3 -m ichrome -p 9222 --start_url "http://bing.com" --disable_image
2018-11-27 23:03:57 INFO [ichrome] __main__.py(69): ChromeDaemon cmd args: {'daemon': True, 'block': True, 'chrome_path': '', 'host': 'localhost', 'port': 9222, 'headless': False, 'user_agent': '', 'proxy': '', 'user_data_dir': None, 'disable_image': True, 'start_url': 'http://bing.com', 'extra_config': '', 'max_deaths': 1, 'timeout': 2}
Crawl the given URL, output the HTML DOM
λ python3 -m ichrome --crawl --headless --timeout=2 http://api.ipify.org/
<html><head></head><body><pre style="word-wrap: break-word; white-space: pre-wrap;">38.143.68.66</pre></body></html>
Details:
$ python3 -m ichrome --help
usage:
All the unknown args will be appended to extra_config as chrome original args.
Demo:
> python -m ichrome --host=127.0.0.1 --window-size=1212,1212 --incognito
> ChromeDaemon cmd args: {'daemon': True, 'block': True, 'chrome_path': '', 'host': '127.0.0.1', 'port': 9222, 'headless': False, 'user_agent': '', 'proxy': '', 'user_data_dir': None, 'disable_image': False, 'start_url': 'about:blank', 'extra_config': ['--window-size=1212,1212', '--incognito'], 'max_deaths': 1, 'timeout': 2}
Other operations:
1. kill local chrome process with given port:
python -m ichrome -s 9222
2. clear user_data_dir path (remove the folder and files):
python -m ichrome --clear
python -m ichrome --clean
3. show ChromeDaemon.__doc__:
python -m ichrome --doc
4. crawl the URL, output the HTML DOM:
python -m ichrome --crawl --headless --timeout=2 http://myip.ipip.net/
optional arguments:
-h, --help show this help message and exit
-V, --version ichrome version info
-c CHROME_PATH, --chrome_path CHROME_PATH
chrome executable file path, default to null for
automatic searching
--host HOST --remote-debugging-address, default to 127.0.0.1
-p PORT, --port PORT --remote-debugging-port, default to 9222
--headless --headless and --hide-scrollbars, default to False
-s SHUTDOWN, --shutdown SHUTDOWN
shutdown the given port, only for local running chrome
--user_agent USER_AGENT
--user-agen, default to 'Mozilla/5.0 (Windows NT 10.0;
WOW64) AppleWebKit/537.36 (KHTML, like Gecko)
Chrome/70.0.3538.102 Safari/537.36'
--proxy PROXY --proxy-server, default to None
--user_data_dir USER_DATA_DIR
user_data_dir to save the user data, default to
~/ichrome_user_data
--disable_image disable image for loading performance, default to
False
--start_url START_URL
start url while launching chrome, default to
about:blank
--max_deaths MAX_DEATHS
max deaths in 5 secs, auto restart `max_deaths` times
if crash fast in 5 secs. default to 1 for without
auto-restart
--timeout TIMEOUT timeout to connect the remote server, default to 1 for
localhost
--workers WORKERS the number of worker processes with auto-increment
port, default to 1
--proc_check_interval PROC_CHECK_INTERVAL
check chrome process alive every interval seconds
--crawl crawl the given URL, output the HTML DOM
--clean, --clear clean user_data_dir
--doc show ChromeDaemon.__doc__
--debug set logger level to DEBUG
Interactive Debugging
λ python
Python 3.7.1 (v3.7.1:260ec2c36a, Oct 20 2018, 14:57:15) [MSC v.1915 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from ichrome.debugger import *
>>> tab = get_a_tab()
>>> tab.set_url('http://bing.com')
{'id': 4, 'result': {'frameId': 'DAC309349D270F07505C3DAB71084292', 'loaderId': '181418C22DB39654507D042627C22698'}}
>>> tab.click('#scpl0')
Tag(a)
>>> tab.js('document.getElementById("sb_form_q").value = "jordan"')
{'id': 16, 'result': {'result': {'type': 'string', 'value': 'jordan'}}}
>>> tab.click('#sb_form_go')
Tag(input)
>>> tab.history_back()
True
>>> tab.set_html('hello')
{'id': 17, 'result': {}}
>>> tab.set_ua('no UA')
INFO 2020-05-11 20:14:07 [ichrome] async_utils.py(790): [set_ua] <Tab(connected): 08F4AFF9B389B1D5880AF0C0988B6DD4> userAgent => no UA
{'id': 12, 'result': {}}
>>> tab.set_url('http://httpbin.org/user-agent')
{'id': 14, 'result': {'frameId': '08F4AFF9B389B1D5880AF0C0988B6DD4', 'loaderId': '15761B915F7AC36DC4687C1EED28195B'}}
>>> tab.html
'<html><head></head><body><pre style="word-wrap: break-word; white-space: pre-wrap;">{\n "user-agent": "no UA"\n}\n</pre></body></html>'
[Debugger] debug the features of async Chrome / Tab / Daemon.
Similar to sync usage, but methods come from the AsyncChrome / AsyncTab / AsyncDaemon
Test Code: examples_debug.py
Operating tabs with coroutines in the async environment
Run in a completely asynchronous environment, it's a stable choice.
Test Code: examples_async.py
[Archived] Simple Sync Usage
Sync utils will be hardly maintained, no more new features.
Test Code: examples_sync.py
TODO
-
Concurrent support. (gevent, threading, asyncio) - Add auto_restart while crash.
-
Auto remove the zombie tabs with a lifebook. - Add some useful examples.
- Coroutine support (for asyncio).
- Standard test cases.
- HTTP apis server console [fastapi]. (maybe a new lib)
-
Complete document.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.