Skip to main content

Python SDK for the GetScraping API

Project description

GetScraping Python Client

This is the official Python client library for GetScraping.com, a powerful web scraping API service.

Installation

You can install the GetScraping client library using pip:

pip install getscraping

Usage

To use the GetScraping client, you'll need an API key from GetScraping.com. Once you have your API key, you can start using the client as follows:

from getscraping import GetScrapingClient, GetScrapingParams

client = GetScrapingClient('YOUR_API_KEY')

def scrape_website():
    result = client.scrape(GetScrapingParams(
        url='https://example.com',
        method='GET'
    ))
    html = result.text
    print(html)

scrape_website()

Features

The GetScraping client supports a wide range of features, including:

  • Basic web scraping
  • JavaScript rendering
  • Custom headers and cookies
  • Proxy support (ISP, residential, and mobile)
  • Retrying requests
  • Programmable browser actions
  • Parameter validation using Pydantic models

API Reference

GetScrapingClient

The main class for interacting with the GetScraping API.

client = GetScrapingClient(api_key: str)

scrape(params: GetScrapingParams)

The primary method for scraping websites.

result = client.scrape(params)

Scraping Parameters

The GetScrapingParams model supports the following options:

  • url (str): The URL to scrape (should include http:// or https://)
  • method (str): The HTTP method to use ('GET' or 'POST')
  • response_type (str): The expected response type (default: "text")
  • body (str, optional): The payload to include in a POST request
  • js_rendering_options (JavascriptRenderingOptions, optional): Options for JavaScript rendering
  • cookies (List[str], optional): List of cookies to include in the request
  • headers (Dict[str, str], optional): Custom headers to attach to the request
  • omit_default_headers (bool): If True, only use the headers you define (default: False)
  • use_isp_proxy (bool, optional): Set to True to route requests through ISP proxies
  • use_residential_proxy (bool, optional): Set to True to route requests through residential proxies
  • use_mobile_proxy (bool, optional): Set to True to route requests through mobile proxies
  • use_own_proxy (str, optional): URL of your own proxy server for this request
  • retry_config (RetryConfig, optional): Configuration for when and how to retry a request
  • timeout_millis (int): How long to wait for the request to complete in milliseconds (default: 30000)

JavaScript Rendering Options (JavascriptRenderingOptions):

  • render_js (bool): Whether to render JavaScript or not
  • wait_millis (int, optional): The time in milliseconds to wait before returning the result
  • wait_for_request (str, optional): The URL (or regex matching the URL) that needs to be requested on page load
  • wait_for_selector (str, optional): CSS or XPATH selector that needs to be present before returning the response
  • intercept_request (InterceptRequestParams, optional): Configuration for intercepting a specific request
  • programmable_browser (ProgrammableBrowserOptions, optional): Configuration for the programmable browser

Retry Configuration (RetryConfig):

  • num_retries (int): How many times to retry unsuccessful requests
  • success_status_codes (List[int], optional): The status codes that will render the request successful
  • success_selector (str, optional): A CSS selector that needs to be present for a request to be considered successful

For more detailed information on these parameters, please refer to the GetScraping documentation.

Examples

Basic Scraping

from getscraping import GetScrapingClient, GetScrapingParams

client = GetScrapingClient('YOUR_API_KEY')

result = client.scrape(GetScrapingParams(
    url='https://example.com',
    method='GET'
))

html = result.text
print(html)

Scraping with JavaScript Rendering

Render JavaScript to scrape dynamic sites. Note: rendering JS will incur an additional cost (5 requests)

from getscraping import GetScrapingClient, GetScrapingParams, JavascriptRenderingOptions

client = GetScrapingClient('YOUR_API_KEY')

result = client.scrape(GetScrapingParams(
    url='https://example.com',
    method='GET',
    js_rendering_options=JavascriptRenderingOptions(
        render_js=True,
        wait_millis=5000
    )
))

html = result.text
print(html)

Using Various Proxies

Typically the best proxy type for bypassing tough anti-bot measures is mobile, then residential, then ISP, and lastly our default proxies.

We recommend trying requests with the default to start and working your way up as necessary, as non-default proxies incur additional costs (costs are: 1 request for default proxies, 5 requests for ISP proxies, 25 for residential, and 50 for mobile).

from getscraping import GetScrapingClient, GetScrapingParams

client = GetScrapingClient('YOUR_API_KEY')

result = client.scrape(GetScrapingParams(
    url='https://example.com',
    method='GET',
    use_residential_proxy=True
))

html = result.text
print(html)

Retrying Requests

from getscraping import GetScrapingClient, GetScrapingParams, RetryConfig

client = GetScrapingClient('YOUR_API_KEY')

result = client.scrape(GetScrapingParams(
    url='https://example.com',
    method='GET',
    retry_config=RetryConfig(
        num_retries=3,
        success_status_codes=[200]
    )
))

html = result.text
print(html)

Using Programmable Browser Actions

from getscraping import GetScrapingClient, GetScrapingParams, JavascriptRenderingOptions, ProgrammableBrowserOptions, ProgrammableBrowserAction

client = GetScrapingClient('YOUR_API_KEY')

result = client.scrape(GetScrapingParams(
    url='https://example.com',
    method='GET',
    js_rendering_options=JavascriptRenderingOptions(
        render_js=True,
        programmable_browser=ProgrammableBrowserOptions(
            actions=[
                ProgrammableBrowserAction(
                    type='click',
                    selector='#submit-button'
                ),
                ProgrammableBrowserAction(
                    type='wait',
                    wait_millis=2000
                )
            ]
        )
    )
))

html = result.text
print(html)

Advanced Usage

For more advanced usage, including intercepting requests and other programmable browser actions, please refer to the GetScraping documentation.

Support

If you encounter any issues or have questions, please send us a message support@getscraping.com or open an issue in the GitHub repository.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

getscraping-1.0.4.tar.gz (41.8 kB view details)

Uploaded Source

Built Distribution

getscraping-1.0.4-py3-none-any.whl (29.2 kB view details)

Uploaded Python 3

File details

Details for the file getscraping-1.0.4.tar.gz.

File metadata

  • Download URL: getscraping-1.0.4.tar.gz
  • Upload date:
  • Size: 41.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.8.2

File hashes

Hashes for getscraping-1.0.4.tar.gz
Algorithm Hash digest
SHA256 a41571556dc668463674c31ff03560ec00c4f29c6506be6d1850cf3fef28bc8b
MD5 b09c981ad413a6ae83c0e93b9e48ac39
BLAKE2b-256 6bb10cafe853308c300ad287acc4e85c7a2454418099e3c1103bd915ef1521a7

See more details on using hashes here.

Provenance

File details

Details for the file getscraping-1.0.4-py3-none-any.whl.

File metadata

  • Download URL: getscraping-1.0.4-py3-none-any.whl
  • Upload date:
  • Size: 29.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.8.2

File hashes

Hashes for getscraping-1.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 d8907297f541a9dcaa1e2e65a87b1a6cdd0da1f28b812c8b804e8f0e5ad1f9e7
MD5 3fd56d1abc7a1a620ce70e8a634f565b
BLAKE2b-256 adca03c537c4eea3b796baa7f26a95d2735b1736e6244741d4e86ba7fb78764b

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page