A library to prevent detections caused by Runtime.enable.
Project description
PyppeteerProtect
PyppeterProtect is an implementation of rebrowser-patches, in pyppeteer, with the notable difference of not requiring you to modify your installation of pyppeteer for it to work. You simply call PyppeteerProtect on a target page and the patches get applied automatically.
PyppeteerProtect (at the moment) doesn't provide protection for running in headless mode, besides a simple set of the useragent to remove HeadlessChrome. For this you should look into finding an additional library that you can run over PyppeteerProtect that can offer such protections, like pyppeteer_stealth, for example (though this specifically, only makes you more detectable to the major anti-bot solutions).
Install
$ pip install PyppeteerProtect
Usage
Import the library:
from PyppeteerProtect import PyppeteerProtect, SetSecureArguments;
Set default arguments for the chrome executable that help stay protected (sets --disable-blink-features=AutomationControlled and removes --enable-automation)
SetSecureArguments(); # should be called before pyppeteer.launch
Protect individual pages:
pageProtect = await PyppeteerProtect(page);
Switch between using the main and an isolated execution context:
await pageProtect.useMainWorld();
await pageProtect.useIsolatedWorld();
You are freely able to swap between each of the contexts during an active session. As an example, you might want to do something like this:
await pageProtect.useIsolatedWorld();
token = await page.evaluate("() => document.querySelector('input[type=\'hidden\']#embedded-token')"); # document.querySelector might have been hooked in the main world to block queries for #embedded-token
await pageProtect.useMainWorld();
data = await page.evaluate("(token) => window.get_some_data(token)", token);
By default, PyppeteerProtect will use the execution context id of an isolated world. This is ideal for ensuring maximum security, as you don't have to worry about calling hooked global functions or accidentally leaking your pressence through global variables, however, it makes the code of the target page inaccessible.
If you plan on using the main world execution context and nothing else, you can configure the PyppeteerProtect constructor to use it on creation like so:
pageProtect = await PyppeteerProtect(page, True);
If you have a particularly special use case and are having issues with automatically obtaining an execution context id, you can use PyppeteerProtect to wait until one is obtained (though if you stick to basic Page.evaluate calls, this isn't something you should be worried about, as it gets called automatically)
await pageProtect.waitForExecutionContext();
Example
import asyncio;
from pyppeteer import launch;
from PyppeteerProtect import PyppeteerProtect, SetSecureArguments;
SetSecureArguments(); # set --disable-blink-features=AutomationControlled and remove --enable-automation
loop = asyncio.new_event_loop();
async def main():
browser = await launch(
executablePath = "C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe",
headless = False, # currently no protection for running headless
defaultViewport = {"width": 1920, "height": 953},
loop = loop
);
page = (await browser.pages())[0];
pageProtect = await PyppeteerProtect(page);
await page.goto("https://www.datadome.co");
print(await page.evaluate("()=>'Test Output'"));
await asyncio.sleep(5000);
await browser.close();
loop.run_until_complete(main());
How does it works?
PyppeteerProtect works by calling Runtime.disable and hooking CDPSession.send to drop any Runtime.enable requests sent by the pyppeteer library. Runtime.enable is used to retrieve an execution context id, which is required for functions such as Page.evaluate and Page.querySelectorAll to work, but in doing so, it enables the scripts running on the target page to observe behavior that would indicate the browser is being controlled by automation software, like pyppeteer/puppeteer.
PyppeteerProtect retrieves an execution context either by calling out to a binding (created with Runtime.addBinding and Runtime.bindingCalled, and called using Page.addScriptToEvaluateOnNewDocument and Runtime.evaluate in an isolated context), or by creating an isolated world (using Page.createIsolatedWorld).
These patches are applied automatically on each navigation by listening to the request and response events of the page, and by hooking ExecutionContext.evaluateHandle.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pyppeteerprotect-1.0.1.tar.gz.
File metadata
- Download URL: pyppeteerprotect-1.0.1.tar.gz
- Upload date:
- Size: 5.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8530eacf71396a42ced6a63bafb9f90fff93fb9b43c4b297f2e58feb086228e3
|
|
| MD5 |
ecddfb2a6448d6013d87d61813468f30
|
|
| BLAKE2b-256 |
04cd553ed02ce4c4902a67f806b98e9f53927ed3bade4a1355d4adc14e402e39
|
File details
Details for the file PyppeteerProtect-1.0.1-py3-none-any.whl.
File metadata
- Download URL: PyppeteerProtect-1.0.1-py3-none-any.whl
- Upload date:
- Size: 6.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fb9788eab1cedd1d4a0a004510554704302c7daed23e7f57d2a2d39161658402
|
|
| MD5 |
7bc1fb2c5a1fc9c208942743583eb4c3
|
|
| BLAKE2b-256 |
a150032204d404f7e2a398c2344f01b9bae4cba89440fab435a8db613c80781f
|