A toolkit for using chrome browser with the [Chrome Devtools Protocol(CDP)](https://chromedevtools.github.io/devtools-protocol/), support python3.7+. Read more: https://github.com/ClericPy/ichrome.
Project description
ichrome
A connector to control Chrome browser (Chrome Devtools Protocol(CDP)), for python3.7+.
Install
Install from PyPI
pip install ichrome -U
Uninstall & clear the user data
$ python3 -m ichrome --clean
$ pip uninstall ichrome
Why?
- pyppeteer / selenium is awesome, but I don't need so much
- spelling of pyppeteer is confused.
- selenium is slow.
- async communication with Chrome remote debug port, stable choice. [Recommended]
- sync way to test CDP, which is not recommended for complex production environments. [Deprecated]
- ichrome.debugger is a sync tool and depends on the
ichrome.async_utils
, which may be a better choice.
- ichrome.debugger is a sync tool and depends on the
Features
- Chrome process daemon
- Connect to existing chrome debug port
- Operations on Tabs
AsyncChrome feature list
- server
return
f"http://{self.host}:{self.port}"
, such ashttp://127.0.0.1:9222
- version
version info from
/json/version
format like:{'Browser': 'Chrome/77.0.3865.90', 'Protocol-Version': '1.3', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.90 Safari/537.36', 'V8-Version': '7.7.299.11', 'WebKit-Version': '537.36 (@58c425ba843df2918d9d4b409331972646c393dd)', 'webSocketDebuggerUrl': 'ws://127.0.0.1:9222/devtools/browser/b5fbd149-959b-4603-b209-cfd26d66bdc1'}
connect
/check
/ok
check alive
get_tabs
/tabs
/get_tab
/get_tabs
get the
AsyncTab
instance from/json
.new_tab
/activate_tab
/close_tab
/close_tabs
operating tabs.
close_browser
find the activated tab and send
Browser.close
message, close the connected chrome browser gracefully.await chrome.close_browser()
kill
force kill the chrome process with self.port.
await chrome.kill()
connect_tabs
connect websockets for multiple tabs in one
with
context, and disconnect before exiting.tab0: AsyncTab = (await chrome.tabs)[0] tab1: AsyncTab = await chrome.new_tab() async with chrome.connect_tabs([tab0, tab1]): assert (await tab0.current_url) == 'about:blank' assert (await tab1.current_url) == 'about:blank'
AsyncTab feature list
set_url
/reload
navigate to a new url.
reload
equals toset_url(None)
wait_event
listening the events with given name, and separate from other same-name events with filter_function, finally run the callback_function with result.
wait_page_loading
/wait_loading
wait for
Page.loadEventFired
event, or stop loading while timeout. Different fromwait_loading_finished
.wait_response
/wait_request
filt the
Network.responseReceived
/Network.requestWillBeSent
event byfilter_function
, return therequest_dict
which can be used byget_response
/get_response_body
/get_request_post_data
. WARNING: requestWillBeSent event fired do not mean the response is ready, should await tab.wait_request_loading(request_dict) or await tab.get_response(request_dict, wait_loading=True)wait_request_loading
/wait_loading_finished
sometimes event got
request_dict
withwait_response
, but the ajax request is still fetching, which need to wait theNetwork.loadingFinished
event.activate
/activate_tab
activate tab with websocket / http message.
close
/close_tab
close tab with websocket / http message.
add_js_onload
Page.addScriptToEvaluateOnNewDocument
, which means this javascript code will be run before page loaded.clear_browser_cache
/clear_browser_cookies
Network.clearBrowserCache
andNetwork.clearBrowserCookies
querySelectorAll
get the tag instance, which contains the
tagName, innerHTML, outerHTML, textContent, attributes
attrs.click
click the element queried by given css selector.
refresh_tab_info
to refresh the init attrs:
url
,title
.current_html
/current_title
/current_url
get the current html / title / url with
tab.js
. or using therefresh_tab_info
method and init attrs.crash
Page.crash
get_cookies
/get_all_cookies
/delete_cookies
/set_cookie
some page cookies operations.
set_headers
/set_ua
Network.setExtraHTTPHeaders
andNetwork.setUserAgentOverride
, used to update headers dynamically.close_browser
send
Browser.close
message to close the chrome browser gracefully.get_bounding_client_rect
/get_element_clip
get_element_clip
is alias name for the other, these two method is to get the rect of element which queried by css element.screenshot
/screenshot_element
get the screenshot base64 encoded image data.
screenshot_element
should be given a css selector to locate the element.get_page_size
/get_screen_size
size of current window or the whole screen.
get_response
get the response body with the given request dict.
js
run the given js code, return the raw response from sending
Runtime.evaluate
message.inject_js_url
inject some js url, like
<script src="xxx/static/js/jquery.min.js"></script>
do.get_value
&get_variable
run the given js variable or expression, and return the result.
await tab.get_value('document.title') await tab.get_value("document.querySelector('title').innerText")
keyboard_send
dispath key event with
Input.dispatchKeyEvent
mouse_click
dispath click event on given position
mouse_drag
dispath drag event on given position, and return the target x, y.
duration
arg is to slow down the move speed.mouse_drag_rel
dispath drag event on given offset, and return the target x, y.
mouse_drag_rel
drag with offsets continuously.
await tab.set_url('https://draw.yunser.com/') walker = await tab.mouse_drag_rel_chain(320, 145).move(50, 0, 0.2).move( 0, 50, 0.2).move(-50, 0, 0.2).move(0, -50, 0.2) await walker.move(50 * 1.414, 50 * 1.414, 0.2)
mouse_press
/mouse_release
/mouse_move
/mouse_move_rel
/mouse_move_rel_chain
similar to the drag features. These mouse features is only dispatched events, not the real mouse action.
history_back
/history_forward
/goto_history_relative
/reset_history
back / forward history
Examples
See the Classic Use Cases
Quick Start
-
Start a new chrome daemon process with headless=False
python -m ichrome
or launch chrome daemon in code
async with AsyncChromeDaemon():
-
Create the connection to exist chrome browser
async with AsyncChrome() as chrome:
-
Operations on the tabs: new tab, wait loading, run javascript, get html, close tab
-
Close the browser GRACEFULLY instead of killing process
from ichrome import AsyncChromeDaemon, AsyncChrome
import asyncio
async def main():
# If there is an existing daemon, such as `python -m ichrome`, the `async with AsyncChromeDaemon` context can be omitted.
async with AsyncChromeDaemon():
# connect to an opened chrome
async with AsyncChrome() as chrome:
tab = await chrome.new_tab(url="https://github.com/ClericPy")
# async with tab() as tab:
# and `as tab` can be omitted
async with tab():
await tab.wait_loading(2)
await tab.js("document.write('<h1>Document updated.</h1>')")
await asyncio.sleep(1)
# await tab.js('alert("test ok")')
print('output:', await tab.html)
# output: <html><head></head><body><h1>Document updated.</h1></body></html>
await tab.close()
# close_browser gracefully, I have no more need of chrome instance
await chrome.close_browser()
if __name__ == "__main__":
asyncio.run(main())
Command Line Usage
Be used for launching a chrome daemon process. The unhandled args will be treated as chrome raw args and appended to extra_config list.
λ python3 -m ichrome -s 9222
2018-11-27 23:01:59 DEBUG [ichrome] base.py(329): kill chrome.exe --remote-debugging-port=9222
2018-11-27 23:02:00 DEBUG [ichrome] base.py(329): kill chrome.exe --remote-debugging-port=9222
λ python3 -m ichrome -p 9222 --start_url "http://bing.com" --disable_image
2018-11-27 23:03:57 INFO [ichrome] __main__.py(69): ChromeDaemon cmd args: {'daemon': True, 'block': True, 'chrome_path': '', 'host': 'localhost', 'port': 9222, 'headless': False, 'user_agent': '', 'proxy': '', 'user_data_dir': None, 'disable_image': True, 'start_url': 'http://bing.com', 'extra_config': '', 'max_deaths': 1, 'timeout': 2}
Details:
$ python3 -m ichrome --help
usage:
All the unknown args will be appended to extra_config as chrome original args.
Demo:
> python -m ichrome --host=127.0.0.1 --window-size=1212,1212 --incognito
> ChromeDaemon cmd args: {'daemon': True, 'block': True, 'chrome_path': '', 'host': '127.0.0.1', 'port': 9222, 'headless':False, 'user_agent': '', 'proxy': '', 'user_data_dir': None, 'disable_image': False, 'start_url': 'about:blank', 'extra_config': ['--window-size=1212,1212', '--incognito'], 'max_deaths': 1, 'timeout': 2}
Other operations:
1. kill local chrome process with given port:
python -m ichrome -s 9222
2. clear user_data_dir path (remove the folder and files):
python -m ichrome --clear
python -m ichrome --clean
2. show ChromeDaemon.__doc__:
python -m ichrome --doc
optional arguments:
-h, --help show this help message and exit
-V, --version ichrome version info
-c CHROME_PATH, --chrome_path CHROME_PATH
chrome executable file path, default to null for
automatic searching
--host HOST --remote-debugging-address, default to 127.0.0.1
-p PORT, --port PORT --remote-debugging-port, default to 9222
--headless --headless and --hide-scrollbars, default to False
-s SHUTDOWN, --shutdown SHUTDOWN
shutdown the given port, only for local running chrome
--user_agent USER_AGENT
--user-agen, default to 'Mozilla/5.0 (Windows NT 10.0;
WOW64) AppleWebKit/537.36 (KHTML, like Gecko)
Chrome/70.0.3538.102 Safari/537.36'
--proxy PROXY --proxy-server, default to None
--user_data_dir USER_DATA_DIR
user_data_dir to save the user data, default to
~/ichrome_user_data
--disable_image disable image for loading performance, default to
False
--start_url START_URL
start url while launching chrome, default to
about:blank
--max_deaths MAX_DEATHS
max deaths in 5 secs, auto restart `max_deaths` times
if crash fast in 5 secs. default to 1 for without
auto-restart
--timeout TIMEOUT timeout to connect the remote server, default to 1 for
localhost
--workers WORKERS the number of worker processes with auto-increment
port, default to 1
--proc_check_interval PROC_CHECK_INTERVAL
check chrome process alive every interval seconds
--clean, --clear clean user_data_dir
--doc show ChromeDaemon.__doc__
--debug set logger level to DEBUG
Interactive Debugging
λ python
Python 3.7.1 (v3.7.1:260ec2c36a, Oct 20 2018, 14:57:15) [MSC v.1915 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from ichrome.debugger import *
>>> tab = get_a_tab()
>>> tab.set_url('http://bing.com')
{'id': 4, 'result': {'frameId': 'DAC309349D270F07505C3DAB71084292', 'loaderId': '181418C22DB39654507D042627C22698'}}
>>> tab.click('#scpl0')
Tag(a)
>>> tab.js('document.getElementById("sb_form_q").value = "jordan"')
{'id': 16, 'result': {'result': {'type': 'string', 'value': 'jordan'}}}
>>> tab.click('#sb_form_go')
Tag(input)
>>> tab.history_back()
True
>>> tab.set_html('hello')
{'id': 17, 'result': {}}
>>> tab.set_ua('no UA')
INFO 2020-05-11 20:14:07 [ichrome] async_utils.py(790): [set_ua] <Tab(connected): 08F4AFF9B389B1D5880AF0C0988B6DD4> userAgent => no UA
{'id': 12, 'result': {}}
>>> tab.set_url('http://httpbin.org/user-agent')
{'id': 14, 'result': {'frameId': '08F4AFF9B389B1D5880AF0C0988B6DD4', 'loaderId': '15761B915F7AC36DC4687C1EED28195B'}}
>>> tab.html
'<html><head></head><body><pre style="word-wrap: break-word; white-space: pre-wrap;">{\n "user-agent": "no UA"\n}\n</pre></body></html>'
[Debugger] debug the features of async Chrome / Tab / Daemon.
Similar to sync usage, but methods come from the AsyncChrome / AsyncTab / AsyncDaemon
Test Code: examples_debug.py
Operating tabs with coroutines in the async environment
Run in a completely asynchronous environment, it's a stable choice.
Test Code: examples_async.py
[Archived] Simple Sync Usage
Sync utils will be hardly maintained, no more new features.
Test Code: examples_sync.py
TODO
-
Concurrent support. (gevent, threading, asyncio) - Add auto_restart while crash.
-
Auto remove the zombie tabs with a lifebook. - Add some useful examples.
- Coroutine support (for asyncio).
- Standard test cases.
- HTTP apis server console [fastapi]. (maybe a new lib)
-
Complete document.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file ichrome-2.1.0-py3-none-any.whl
.
File metadata
- Download URL: ichrome-2.1.0-py3-none-any.whl
- Upload date:
- Size: 39.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.4.2 requests/2.23.0 setuptools/41.0.1 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.7.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3fbb4925a978204fc97b1db4e20a87b0ecfd262393a8986bf0d7ce2608210819 |
|
MD5 | e3e7f6474f384c08e48f1d45d801a667 |
|
BLAKE2b-256 | d9f9465b4fa6649d0ae8e389a8c98ba4a849a989b58309ea960a0dc2606eafe1 |