Skip to main content

A library to prevent detections caused by Runtime.enable.

Project description

PyppeteerProtect

PyppeterProtect is an implementation of rebrowser-patches, in pyppeteer, with the notable difference of not requiring you to modify your installation of pyppeteer for it to work. You simply call PyppeteerProtect on a target page and the patches get applied automatically.

PyppeteerProtect (at the moment) doesn't provide protection for running in headless mode, besides a simple set of the useragent to remove HeadlessChrome. For this you should look into finding an additional library that you can run over PyppeteerProtect that can offer such protections, like pyppeteer_stealth, for example (though this specifically, only makes you more detectable to the major anti-bot solutions).

Install

$ pip install PyppeteerProtect

Usage

Import the library:

from PyppeteerProtect import PyppeteerProtect, SetSecureArguments;

Set default arguments for the chrome executable that help stay protected (sets --disable-blink-features=AutomationControlled and removes --enable-automation)

SetSecureArguments(); # should be called before pyppeteer.launch

Protect individual pages:

pageProtect = await PyppeteerProtect(page);

Switch between using the main and an isolated execution context:

await pageProtect.useMainWorld();
await pageProtect.useIsolatedWorld();

You are freely able to swap between each of the contexts during an active session. As an example, you might want to do something like this:

await pageProtect.useIsolatedWorld();
token = await page.evaluate("() => document.querySelector('input[type=\'hidden\']#embedded-token')"); # document.querySelector might have been hooked in the main world to block queries for #embedded-token
await pageProtect.useMainWorld();
data = await page.evaluate("(token) => window.get_some_data(token)", token);

By default, PyppeteerProtect will use the execution context id of an isolated world. This is ideal for ensuring maximum security, as you don't have to worry about calling hooked global functions or accidentally leaking your pressence through global variables, however, it makes the code of the target page inaccessible.

If you plan on using the main world execution context and nothing else, you can configure the PyppeteerProtect constructor to use it on creation like so:

pageProtect = await PyppeteerProtect(page, True);

If you have a particularly special use case and are having issues with automatically obtaining an execution context id, you can use PyppeteerProtect to wait until one is obtained (though if you stick to basic Page.evaluate calls, this isn't something you should be worried about, as it gets called automatically)

await pageProtect.waitForExecutionContext();

Example

import asyncio;

from pyppeteer import launch;
from PyppeteerProtect import PyppeteerProtect, SetSecureArguments;

SetSecureArguments(); # set --disable-blink-features=AutomationControlled and remove --enable-automation

loop = asyncio.new_event_loop();
async def main():
    browser = await launch(
        executablePath = "C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe",
        headless = False, # currently no protection for running headless
        defaultViewport = {"width": 1920, "height": 953},
        loop = loop
    );

    page = (await browser.pages())[0];
    pageProtect = await PyppeteerProtect(page);
	
    await page.goto("https://www.datadome.co");
    print(await page.evaluate("()=>'Test Output'"));

    await asyncio.sleep(5000);
    await browser.close();

loop.run_until_complete(main());

How does it works?

PyppeteerProtect works by calling Runtime.disable and hooking CDPSession.send to drop any Runtime.enable requests sent by the pyppeteer library. Runtime.enable is used to retrieve an execution context id, which is required for functions such as Page.evaluate and Page.querySelectorAll to work, but in doing so, it enables the scripts running on the target page to observe behavior that would indicate the browser is being controlled by automation software, like pyppeteer/puppeteer.

PyppeteerProtect retrieves an execution context either by calling out to a binding (created with Runtime.addBinding and Runtime.bindingCalled, and called using Page.addScriptToEvaluateOnNewDocument and Runtime.evaluate in an isolated context), or by creating an isolated world (using Page.createIsolatedWorld).

These patches are applied automatically on each navigation by listening to the request and response events of the page, and by hooking ExecutionContext.evaluateHandle.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyppeteerprotect-1.0.1.tar.gz (5.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

PyppeteerProtect-1.0.1-py3-none-any.whl (6.2 kB view details)

Uploaded Python 3

File details

Details for the file pyppeteerprotect-1.0.1.tar.gz.

File metadata

  • Download URL: pyppeteerprotect-1.0.1.tar.gz
  • Upload date:
  • Size: 5.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.13

File hashes

Hashes for pyppeteerprotect-1.0.1.tar.gz
Algorithm Hash digest
SHA256 8530eacf71396a42ced6a63bafb9f90fff93fb9b43c4b297f2e58feb086228e3
MD5 ecddfb2a6448d6013d87d61813468f30
BLAKE2b-256 04cd553ed02ce4c4902a67f806b98e9f53927ed3bade4a1355d4adc14e402e39

See more details on using hashes here.

File details

Details for the file PyppeteerProtect-1.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for PyppeteerProtect-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 fb9788eab1cedd1d4a0a004510554704302c7daed23e7f57d2a2d39161658402
MD5 7bc1fb2c5a1fc9c208942743583eb4c3
BLAKE2b-256 a150032204d404f7e2a398c2344f01b9bae4cba89440fab435a8db613c80781f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page