A Python module to bypass Cloudflare's anti-bot page.

These details have not been verified by PyPI

Project links

Homepage

Project description

cloudscraper

A simple Python module to bypass Cloudflare's anti-bot page (also known as "I'm Under Attack Mode", or IUAM), implemented with Requests. Cloudflare changes their techniques periodically, so I will update this repo frequently.

This can be useful if you wish to scrape or crawl a website protected with Cloudflare. Cloudflare's anti-bot page currently just checks if the client supports Javascript, though they may add additional techniques in the future.

Due to Cloudflare continually changing and hardening their protection page, cloudscraper requires a JavaScript Engine/interpreter to solve Javascript challenges. This allows the script to easily impersonate a regular web browser without explicitly deobfuscating and parsing Cloudflare's Javascript.

For reference, this is the default message Cloudflare uses for these sorts of pages:

Checking your browser before accessing website.com.

This process is automatic. Your browser will redirect to your requested content shortly.

Please allow up to 5 seconds...

Any script using cloudscraper will sleep for ~5 seconds for the first visit to any site with Cloudflare anti-bots enabled, though no delay will occur after the first request.

Donations

If you feel like showing your love and/or appreciation for this project, then how about shouting me a coffee or beer :)

Installation

Simply run pip install cloudscraper. The PyPI package is at https://pypi.python.org/pypi/cloudscraper/

Alternatively, clone this repository and run python setup.py install.

Dependencies

Python 3.x
Requests >= 2.9.2
requests_toolbelt >= 0.9.1

python setup.py install will install the Python dependencies automatically. The javascript interpreters and/or engines you decide to use are the only things you need to install yourself, excluding js2py which is part of the requirements as the default.

Javascript Interpreters and Engines

We support the following Javascript interpreters/engines.

ChakraCore: Library binaries can also be located here.
js2py: >=0.67
native: Self made native python solver (Default)
Node.js
V8: We use Sony's v8eval() python module.

Usage

The simplest way to use cloudscraper is by calling create_scraper().

import cloudscraper

scraper = cloudscraper.create_scraper()  # returns a CloudScraper instance
# Or: scraper = cloudscraper.CloudScraper()  # CloudScraper inherits from requests.Session
print(scraper.get("http://somesite.com").text)  # => "<!DOCTYPE html><html><head>..."

That's it...

Any requests made from this session object to websites protected by Cloudflare anti-bot will be handled automatically. Websites not using Cloudflare will be treated normally. You don't need to configure or call anything further, and you can effectively treat all websites as if they're not protected with anything.

You use cloudscraper exactly the same way you use Requests. cloudScraper works identically to a Requests Session object, just instead of calling requests.get() or requests.post(), you call scraper.get() or scraper.post().

Consult Requests' documentation for more information.

Options

Disable Cloudflare V1

Description

If you don't want to even attempt Cloudflare v1 (Deprecated) solving..

Parameters

Parameter	Value	Default
disableCloudflareV1	(boolean)	False

Example

scraper = cloudscraper.create_scraper(disableCloudflareV1=True)

Brotli

Description

Brotli decompression support has been added, and it is enabled by default.

Parameters

Parameter	Value	Default
allow_brotli	(boolean)	True

Example

scraper = cloudscraper.create_scraper(allow_brotli=False)

Browser / User-Agent Filtering

Description

Control how and which User-Agent is "randomly" selected.

Parameters

Can be passed as an argument to create_scraper(), get_tokens(), get_cookie_string().

Parameter	Value	Default
browser	(string) `chrome` or `firefox`	None

Parameter	Value	Default
browser	(dict)

`browser` dict Parameters

Parameter	Value	Default
browser	(string) `chrome` or `firefox`	None
mobile	(boolean)	True
desktop	(boolean)	True
platform	(string) `'linux', 'windows', 'darwin', 'android', 'ios'`	None
custom	(string)	None

Example

scraper = cloudscraper.create_scraper(browser='chrome')

# will give you only mobile chrome User-Agents on Android
scraper = cloudscraper.create_scraper(
    browser={
        'browser': 'chrome',
        'platform': 'android',
        'desktop': False
    }
)

# will give you only desktop firefox User-Agents on Windows
scraper = cloudscraper.create_scraper(
    browser={
        'browser': 'firefox',
        'platform': 'windows',
        'mobile': False
    }
)

# Custom will also try find the user-agent string in the browsers.json,
# If a match is found, it will use the headers and cipherSuite from that "browser",
# Otherwise a generic set of headers and cipherSuite will be used.
scraper = cloudscraper.create_scraper(
    browser={
        'custom': 'ScraperBot/1.0',
    }
)

Debug

Description

Prints out header and content information of the request for debugging.

Parameters

Can be set as an attribute via your cloudscraper object or passed as an argument to create_scraper(), get_tokens(), get_cookie_string().

Parameter	Value	Default
debug	(boolean)	False

Example

scraper = cloudscraper.create_scraper(debug=True)

Delays

Description

Cloudflare IUAM challenge requires the browser to wait ~5 seconds before submitting the challenge answer, If you would like to override this delay.

Parameters

Can be set as an attribute via your cloudscraper object or passed as an argument to create_scraper(), get_tokens(), get_cookie_string().

Parameter	Value	Default
delay	(float)	extracted from IUAM page

Example

scraper = cloudscraper.create_scraper(delay=10)

Existing session

Description:

If you already have an existing Requests session, you can pass it to the function create_scraper() to continue using that session.

Parameters

Parameter	Value	Default
sess	(requests.session)	None

Example

session = requests.session()
scraper = cloudscraper.create_scraper(sess=session)

Note

Unfortunately, not all of Requests session attributes are easily transferable, so if you run into problems with this,

You should replace your initial session initialization call

From:

sess = requests.session()

To:

sess = cloudscraper.create_scraper()

JavaScript Engines and Interpreters

Description

cloudscraper currently supports the following JavaScript Engines/Interpreters

ChakraCore
js2py
native: Self made native python solver (Default)
Node.js
V8

Parameters

Can be set as an attribute via your cloudscraper object or passed as an argument to create_scraper(), get_tokens(), get_cookie_string().

Parameter	Value	Default
interpreter	(string)	`native`

Example

scraper = cloudscraper.create_scraper(interpreter='nodejs')

3rd Party Captcha Solvers

Description

cloudscraper currently supports the following 3rd party Captcha solvers, should you require them.

2captcha
anticaptcha
CapSolver
CapMonster Cloud
deathbycaptcha
9kw
return_response

Note

I am working on adding more 3rd party solvers, if you wish to have a service added that is not currently supported, please raise a support ticket on github.

Required Parameters

Can be set as an attribute via your cloudscraper object or passed as an argument to create_scraper(), get_tokens(), get_cookie_string().

Parameter	Value	Default
captcha	(dict)	None

2captcha

Required `captcha` Parameters

Parameter	Value	Required	Default
provider	(string) `2captcha`	yes
api_key	(string)	yes
no_proxy	(boolean)	no	False

Note

if proxies are set you can disable sending the proxies to 2captcha by setting no_proxy to True

Example

scraper = cloudscraper.create_scraper(
  captcha={
    'provider': '2captcha',
    'api_key': 'your_2captcha_api_key'
  }
)

anticaptcha

Required `captcha` Parameters

Parameter	Value	Required	Default
provider	(string) `anticaptcha`	yes
api_key	(string)	yes
no_proxy	(boolean)	no	False

Note

if proxies are set you can disable sending the proxies to anticaptcha by setting no_proxy to True

Example

scraper = cloudscraper.create_scraper(
  captcha={
    'provider': 'anticaptcha',
    'api_key': 'your_anticaptcha_api_key'
  }
)

CapSolver

Required `captcha` Parameters

Parameter	Value	Required	Default
provider	(string) `captchaai`	yes
api_key	(string)	yes

Example

scraper = cloudscraper.create_scraper(
  captcha={
    'provider': 'capsolver',
    'api_key': 'your_captchaai_api_key'
  }
)

CapMonster Cloud

Required `captcha` Parameters

Parameter	Value	Required	Default
provider	(string) `capmonster`	yes
clientKey	(string)	yes
no_proxy	(boolean)	no	False

Note

if proxies are set you can disable sending the proxies to CapMonster by setting no_proxy to True

Example

scraper = cloudscraper.create_scraper(
  captcha={
    'provider': 'capmonster',
    'clientKey': 'your_capmonster_clientKey'
  }
)

deathbycaptcha

Required `captcha` Parameters

Parameter	Value	Required
provider	(string) `deathbycaptcha`	yes
username	(string)	yes
password	(string)	yes

Example

scraper = cloudscraper.create_scraper(
  captcha={
    'provider': 'deathbycaptcha',
    'username': 'your_deathbycaptcha_username',
    'password': 'your_deathbycaptcha_password',
  }
)

9kw

Required `captcha` Parameters

Parameter	Value	Required	Default
provider	(string) `9kw`	yes
api_key	(string)	yes
maxtimeout	(int)	no	180

Example

scraper = cloudscraper.create_scraper(
  captcha={
    'provider': '9kw',
    'api_key': 'your_9kw_api_key',
    'maxtimeout': 300
  }
)

return_response

Use this if you want the requests response payload without solving the Captcha.

Required `captcha` Parameters

Parameter	Value	Required	Default
provider	(string) `return_response`	yes

Example

scraper = cloudscraper.create_scraper(
  captcha={'provider': 'return_response'}
)

Integration

It's easy to integrate cloudscraper with other applications and tools. Cloudflare uses two cookies as tokens: one to verify you made it past their challenge page and one to track your session. To bypass the challenge page, simply include both of these cookies (with the appropriate user-agent) in all HTTP requests you make.

To retrieve just the cookies (as a dictionary), use cloudscraper.get_tokens(). To retrieve them as a full Cookie HTTP header, use cloudscraper.get_cookie_string().

get_tokens and get_cookie_string both accept Requests' usual keyword arguments (like get_tokens(url, proxies={"http": "socks5://localhost:9050"})).

Please read Requests' documentation on request arguments for more information.

User-Agent Handling

The two integration functions return a tuple of (cookie, user_agent_string).

You must use the same user-agent string for obtaining tokens and for making requests with those tokens, otherwise Cloudflare will flag you as a bot.

That means you have to pass the returned user_agent_string to whatever script, tool, or service you are passing the tokens to (e.g. curl, or a specialized scraping tool), and it must use that passed user-agent when it makes HTTP requests.

Integration examples

Remember, you must always use the same user-agent when retrieving or using these cookies. These functions all return a tuple of (cookie_dict, user_agent_string).

Retrieving a cookie dict through a proxy

get_tokens is a convenience function for returning a Python dict containing Cloudflare's session cookies. For demonstration, we will configure this request to use a proxy. (Please note that if you request Cloudflare clearance tokens through a proxy, you must always use the same proxy when those tokens are passed to the server. Cloudflare requires that the challenge-solving IP and the visitor IP stay the same.)

If you do not wish to use a proxy, just don't pass the proxies keyword argument. These convenience functions support all of Requests' normal keyword arguments, like params, data, and headers.

import cloudscraper

proxies = {"http": "http://localhost:8080", "https": "http://localhost:8080"}
tokens, user_agent = cloudscraper.get_tokens("http://somesite.com", proxies=proxies)
print(tokens)
# => {
    'cf_clearance': 'c8f913c707b818b47aa328d81cab57c349b1eee5-1426733163-3600',
    '__cfduid': 'dd8ec03dfdbcb8c2ea63e920f1335c1001426733158'
}

Retrieving a cookie string

get_cookie_string is a convenience function for returning the tokens as a string for use as a Cookie HTTP header value.

This is useful when crafting an HTTP request manually, or working with an external application or library that passes on raw cookie headers.

import cloudscraper

cookie_value, user_agent = cloudscraper.get_cookie_string('http://somesite.com')

print('GET / HTTP/1.1\nCookie: {}\nUser-Agent: {}\n'.format(cookie_value, user_agent))

# GET / HTTP/1.1
# Cookie: cf_clearance=c8f913c707b818b47aa328d81cab57c349b1eee5-1426733163-3600; __cfduid=dd8ec03dfdbcb8c2ea63e920f1335c1001426733158
# User-Agent: Some/User-Agent String

curl example

Here is an example of integrating cloudscraper with curl. As you can see, all you have to do is pass the cookies and user-agent to curl.

import subprocess
import cloudscraper

# With get_tokens() cookie dict:

# tokens, user_agent = cloudscraper.get_tokens("http://somesite.com")
# cookie_arg = 'cf_clearance={}; __cfduid={}'.format(tokens['cf_clearance'], tokens['__cfduid'])

# With get_cookie_string() cookie header; recommended for curl and similar external applications:

cookie_arg, user_agent = cloudscraper.get_cookie_string('http://somesite.com')

# With a custom user-agent string you can optionally provide:

# ua = "Scraping Bot"
# cookie_arg, user_agent = cloudscraper.get_cookie_string("http://somesite.com", user_agent=ua)

result = subprocess.check_output(
    [
        'curl',
        '--cookie',
        cookie_arg,
        '-A',
        user_agent,
        'http://somesite.com'
    ]
)

Trimmed down version. Prints page contents of any site protected with Cloudflare, via curl.

Warning: shell=True can be dangerous to use with subprocess in real code.

url = "http://somesite.com"
cookie_arg, user_agent = cloudscraper.get_cookie_string(url)
cmd = "curl --cookie {cookie_arg} -A {user_agent} {url}"
print(
    subprocess.check_output(
        cmd.format(
            cookie_arg=cookie_arg,
            user_agent=user_agent,
            url=url
        ),
        shell=True
    )
)

Cryptography

Description

Control communication between client and server

Parameters

Can be passed as an argument to create_scraper().

Parameter	Value	Default
cipherSuite	(string)	None
ecdhCurve	(string)	prime256v1
server_hostname	(string)	None

Example

# Some servers require the use of a more complex ecdh curve than the default "prime256v1"
# It may can solve handshake failure
scraper = cloudscraper.create_scraper(ecdhCurve='secp384r1')

# Manipulate server_hostname
scraper = cloudscraper.create_scraper(server_hostname='www.somesite.com')
scraper.get(
    'https://backend.hosting.com/',
    headers={'Host': 'www.somesite.com'}
)

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

1.2.71

Apr 25, 2023

1.2.70

Apr 25, 2023

1.2.69

Feb 25, 2023

1.2.68

Jan 10, 2023

1.2.67

Jan 5, 2023

1.2.66

Nov 23, 2022

1.2.65

Nov 9, 2022

1.2.64

Aug 29, 2022

1.2.63

Aug 27, 2022

1.2.62

Aug 27, 2022

1.2.61

Aug 27, 2022

1.2.60

Mar 15, 2022

1.2.58

Apr 6, 2021

1.2.56

Jan 28, 2021

1.2.54

Jan 27, 2021

1.2.52

Jan 7, 2021

1.2.50

Dec 22, 2020

1.2.48

Sep 27, 2020

1.2.46

Jul 27, 2020

1.2.44

Jul 24, 2020

1.2.42

Jul 2, 2020

1.2.40

May 27, 2020

1.2.38

May 16, 2020

1.2.36

May 4, 2020

1.2.34

Apr 22, 2020

1.2.33

Apr 2, 2020

1.2.30

Mar 20, 2020

1.2.28

Mar 10, 2020

1.2.26

Mar 4, 2020

1.2.24

Feb 18, 2020

1.2.23

Feb 15, 2020

1.2.22

Feb 14, 2020

1.2.20

Jan 15, 2020

1.2.18

Dec 25, 2019

1.2.16

Dec 12, 2019

1.2.14

Nov 28, 2019

1.2.13

Nov 27, 2019

1.2.12

Nov 27, 2019

1.2.11

Nov 27, 2019

1.2.10

Nov 27, 2019

1.2.9

Nov 27, 2019

1.2.8

Nov 12, 2019

1.2.7

Nov 6, 2019

1.2.6

Oct 28, 2019

1.2.5

Oct 23, 2019

1.2.4

Oct 17, 2019

1.2.2

Oct 9, 2019

1.2.1

Oct 9, 2019

1.2.0

Oct 8, 2019

1.1.47

Oct 3, 2019

1.1.46

Sep 28, 2019

1.1.45

Sep 27, 2019

1.1.43

Sep 19, 2019

1.1.42

Sep 8, 2019

1.1.41

Sep 5, 2019

1.1.40

Aug 11, 2019

1.1.39

Jul 27, 2019

1.1.36

Jul 17, 2019

1.1.34

Jul 16, 2019

1.1.33

Jul 10, 2019

1.1.32

Jul 9, 2019

1.1.31

Jul 8, 2019

1.1.30

Jul 8, 2019

1.1.29

Jul 7, 2019

1.1.28

Jul 7, 2019

1.1.27

Jul 7, 2019

1.1.26

Jul 5, 2019

1.1.25

Jul 4, 2019

1.1.24

Jun 28, 2019

1.1.23

Jun 27, 2019

1.1.20

Jun 26, 2019

1.1.19

Jun 20, 2019

1.1.18

Jun 14, 2019

1.1.17

Jun 6, 2019

1.1.16

Jun 3, 2019

1.1.15

Jun 2, 2019

1.1.14

May 31, 2019

1.1.13

May 31, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cloudscraper-1.2.71.tar.gz (93.3 kB view details)

Uploaded Apr 25, 2023 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cloudscraper-1.2.71-py2.py3-none-any.whl (99.7 kB view details)

Uploaded Apr 25, 2023 Python 2Python 3

File details

Details for the file cloudscraper-1.2.71.tar.gz.

File metadata

Download URL: cloudscraper-1.2.71.tar.gz
Upload date: Apr 25, 2023
Size: 93.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.26.0 requests-toolbelt/0.9.1 urllib3/1.26.5 tqdm/4.63.0 importlib-metadata/4.11.3 keyring/22.0.1 rfc3986/2.0.0 colorama/0.4.3 CPython/3.9.2

File hashes

Hashes for cloudscraper-1.2.71.tar.gz
Algorithm	Hash digest
SHA256	`429c6e8aa6916d5bad5c8a5eac50f3ea53c9ac22616f6cb21b18dcc71517d0d3`
MD5	`e90af53f2a5b8e4b633285054b0ddeaa`
BLAKE2b-256	`ac256d0481860583f44953bd791de0b7c4f6d7ead7223f8a17e776247b34a5b4`

See more details on using hashes here.

File details

Details for the file cloudscraper-1.2.71-py2.py3-none-any.whl.

File metadata

Download URL: cloudscraper-1.2.71-py2.py3-none-any.whl
Upload date: Apr 25, 2023
Size: 99.7 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.26.0 requests-toolbelt/0.9.1 urllib3/1.26.5 tqdm/4.63.0 importlib-metadata/4.11.3 keyring/22.0.1 rfc3986/2.0.0 colorama/0.4.3 CPython/3.9.2

File hashes

Hashes for cloudscraper-1.2.71-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`76f50ca529ed2279e220837befdec892626f9511708e200d48d5bb76ded679b0`
MD5	`b2fe568e6e401ff7bc6feaf33f82133f`
BLAKE2b-256	`8197fc88803a451029688dffd7eb446dc1b529657577aec13aceff1cc9628c5d`

See more details on using hashes here.

cloudscraper 1.2.71

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

cloudscraper

Donations

Installation

Dependencies

Javascript Interpreters and Engines

Usage

Options

Disable Cloudflare V1

Description

Parameters

Example

Brotli

Description

Parameters

Example

Browser / User-Agent Filtering

Description

Parameters

browser dict Parameters

Example

Debug

Description

Parameters

Example

Delays

Description

Parameters

Example

Existing session

Description:

Parameters

Example

Note

JavaScript Engines and Interpreters

Description

Parameters

Example

3rd Party Captcha Solvers

Description

Note

Required Parameters

2captcha

Required captcha Parameters

Note

Example

anticaptcha

Required captcha Parameters

Note

Example

CapSolver

Required captcha Parameters

Example

CapMonster Cloud

Required captcha Parameters

Note

Example

deathbycaptcha

Required captcha Parameters

Example

9kw

Required captcha Parameters

Example

return_response

Required captcha Parameters

Example

Integration

User-Agent Handling

Integration examples

Retrieving a cookie dict through a proxy

Retrieving a cookie string

curl example

`browser` dict Parameters

Required `captcha` Parameters

Required `captcha` Parameters

Required `captcha` Parameters

Required `captcha` Parameters

Required `captcha` Parameters

Required `captcha` Parameters

Required `captcha` Parameters