Skip to main content

ScrapingBee Python SDK

Project description

ScrapingBee Python SDK

lint-test-publish version python

ScrapingBee is a web scraping API that handles headless browsers and rotates proxies for you. The Python SDK makes it easier to interact with ScrapingBee's API.

Installation

You can install ScrapingBee Python SDK with pip.

pip install scrapingbee

Usage

The ScrapingBee Python SDK is a wrapper around the requests library. ScrapingBee supports GET and POST requests.

Signup to ScrapingBee to get your API key and some free credits to get started.

Making a GET request

>>> from scrapingbee import ScrapingBeeClient

>>> client = ScrapingBeeClient(api_key='REPLACE-WITH-YOUR-API-KEY')

>>> response = client.get(
    'https://www.scrapingbee.com/blog/', 
    params={
        # Block ads on the page you want to scrape	
        'block_ads': False,
        # Block images and CSS on the page you want to scrape	
        'block_resources': True,
        # Premium proxy geolocation
        'country_code': '',
        # Control the device the request will be sent from	
        'device': 'desktop',
        # Use some data extraction rules
        'extract_rules': {'title': 'h1'},
        # Use AI to extract data from the page
        'ai_extract_rules': {'product_name': 'The name of the product', 'price': 'The price in USD'},
        # Wrap response in JSON
        'json_response': False,
        # Interact with the webpage you want to scrape 
        'js_scenario': {
            "instructions": [
                {"wait_for": "#slow_button"},
                {"click": "#slow_button"},
                {"scroll_x": 1000},
                {"wait": 1000},
                {"scroll_x": 1000},
                {"wait": 1000},            
            ]
        },
        # Use premium proxies to bypass difficult to scrape websites (10-25 credits/request)
        'premium_proxy': False,
        # Execute JavaScript code with a Headless Browser (5 credits/request)
        'render_js': True,
        # Return the original HTML before the JavaScript rendering	
        'return_page_source': False,
        # Return page screenshot as a png image
        'screenshot': False,
        # Take a full page screenshot without the window limitation
        'screenshot_full_page': False,
        # Transparently return the same HTTP code of the page requested.
        'transparent_status_code': False,
        # Wait, in miliseconds, before returning the response
        'wait': 0,
        # Wait for CSS selector before returning the response, ex ".title"
        'wait_for': '',
        # Set the browser window width in pixel
        'window_width': 1920,
        # Set the browser window height in pixel
        'window_height': 1080
    },
    headers={
        # Forward custom headers to the target website
        "key": "value"
    },
    cookies={
        # Forward custom cookies to the target website
        "name": "value"
    }
)
>>> response.text
'<!DOCTYPE html><html lang="en"><head>...'

ScrapingBee takes various parameters to render JavaScript, execute a custom JavaScript script, use a premium proxy from a specific geolocation and more.

You can find all the supported parameters on ScrapingBee's documentation.

You can send custom cookies and headers like you would normally do with the requests library.

Screenshot

Here a little exemple on how to retrieve and store a screenshot from the ScrapingBee blog in its mobile resolution.

>>> from scrapingbee import ScrapingBeeClient

>>> client = ScrapingBeeClient(api_key='REPLACE-WITH-YOUR-API-KEY')

>>> response = client.get(
    'https://www.scrapingbee.com/blog/', 
    params={
        # Take a screenshot
        'screenshot': True,
        # Specify that we need the full height
        'screenshot_full_page': True,
        # Specify a mobile width in pixel
        'window_width': 375
    }
)

>>> if response.ok:
        with open("./scrapingbee_mobile.png", "wb") as f:
            f.write(response.content)

Using ScrapingBee with Scrapy

Scrapy is the most popular Python web scraping framework. You can easily integrate ScrapingBee's API with the Scrapy middleware.

Retries

The client includes a retry mechanism for 5XX responses.

>>> from scrapingbee import ScrapingBeeClient

>>> client = ScrapingBeeClient(api_key='REPLACE-WITH-YOUR-API-KEY')

>>> response = client.get(
    'https://www.scrapingbee.com/blog/', 
    params={
        'render_js': True,
    },
    retries=5
)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapingbee-2.0.2.tar.gz (6.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scrapingbee-2.0.2-py3-none-any.whl (5.2 kB view details)

Uploaded Python 3

File details

Details for the file scrapingbee-2.0.2.tar.gz.

File metadata

  • Download URL: scrapingbee-2.0.2.tar.gz
  • Upload date:
  • Size: 6.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for scrapingbee-2.0.2.tar.gz
Algorithm Hash digest
SHA256 312a2f2beab03eb687e568c3688533c44e543104eea82f18a4183f9ce1042e0f
MD5 2dcb0774a681071a0e137d5cf1b0dbdc
BLAKE2b-256 8f7665afa7e4f025082a10d614e7a45d60c05025e160f3d28740221a68af8255

See more details on using hashes here.

File details

Details for the file scrapingbee-2.0.2-py3-none-any.whl.

File metadata

  • Download URL: scrapingbee-2.0.2-py3-none-any.whl
  • Upload date:
  • Size: 5.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for scrapingbee-2.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 28d1c9b250917e45924d8a21cf7d55776c701d6d1c5283aabf3ef1911462a291
MD5 550214ed8fe5a34aa7e4de097f13b99c
BLAKE2b-256 14a0ce8c66d302517432c76a4edb52599f8827cb3c3a38e7332fe4db2aaca3a9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page