Python SDK for the GetScraping API
Reason this release was yanked:
deprecated, use 1.0.4
Project description
GetScraping Python Client
This is the official Python client library for GetScraping.com, a powerful web scraping API service.
Installation
You can install the GetScraping client library using pip:
pip install getscraping
Usage
To use the GetScraping client, you'll need an API key from GetScraping.com. Once you have your API key, you can start using the client as follows:
from getscraping import GetScrapingClient
from getscraping.models import GetScrapingParams
client = GetScrapingClient('YOUR_API_KEY')
def scrape_website():
result = client.scrape(GetScrapingParams(
url='https://example.com',
method='GET'
))
html = result.text
print(html)
scrape_website()
Features
The GetScraping client supports a wide range of features, including:
- Basic web scraping
- JavaScript rendering
- Custom headers and cookies
- Proxy support (ISP, residential, and mobile)
- Retrying requests
- Programmable browser actions
- Parameter validation using Pydantic models
API Reference
GetScrapingClient
The main class for interacting with the GetScraping API.
client = GetScrapingClient(api_key: str)
scrape(params: GetScrapingParams)
The primary method for scraping websites.
result = client.scrape(params)
Scraping Parameters
The GetScrapingParams
model supports the following options:
url
(str): The URL to scrape (should include http:// or https://)method
(str): The HTTP method to use ('GET' or 'POST')response_type
(str): The expected response type (default: "text")body
(str, optional): The payload to include in a POST requestjs_rendering_options
(JavascriptRenderingOptions, optional): Options for JavaScript renderingcookies
(List[str], optional): List of cookies to include in the requestheaders
(Dict[str, str], optional): Custom headers to attach to the requestomit_default_headers
(bool): If True, only use the headers you define (default: False)use_isp_proxy
(bool, optional): Set to True to route requests through ISP proxiesuse_residential_proxy
(bool, optional): Set to True to route requests through residential proxiesuse_mobile_proxy
(bool, optional): Set to True to route requests through mobile proxiesuse_own_proxy
(str, optional): URL of your own proxy server for this requestretry_config
(RetryConfig, optional): Configuration for when and how to retry a requesttimeout_millis
(int): How long to wait for the request to complete in milliseconds (default: 30000)
JavaScript Rendering Options (JavascriptRenderingOptions
):
render_js
(bool): Whether to render JavaScript or notwait_millis
(int, optional): The time in milliseconds to wait before returning the resultwait_for_request
(str, optional): The URL (or regex matching the URL) that needs to be requested on page loadwait_for_selector
(str, optional): CSS or XPATH selector that needs to be present before returning the responseintercept_request
(InterceptRequestParams, optional): Configuration for intercepting a specific requestprogrammable_browser
(ProgrammableBrowserOptions, optional): Configuration for the programmable browser
Retry Configuration (RetryConfig
):
num_retries
(int): How many times to retry unsuccessful requestssuccess_status_codes
(List[int], optional): The status codes that will render the request successfulsuccess_selector
(str, optional): A CSS selector that needs to be present for a request to be considered successful
For more detailed information on these parameters, please refer to the GetScraping documentation.
Examples
Basic Scraping
from getscraping import GetScrapingClient
from getscraping.models import GetScrapingParams
client = GetScrapingClient('YOUR_API_KEY')
result = client.scrape(GetScrapingParams(
url='https://example.com',
method='GET'
))
html = result.text
print(html)
Scraping with JavaScript Rendering
Render JavaScript to scrape dynamic sites. Note: rendering JS will incur an additional cost (5 requests)
from getscraping import GetScrapingClient
from getscraping.models import GetScrapingParams, JavascriptRenderingOptions
client = GetScrapingClient('YOUR_API_KEY')
result = client.scrape(GetScrapingParams(
url='https://example.com',
method='GET',
js_rendering_options=JavascriptRenderingOptions(
render_js=True,
wait_millis=5000
)
))
html = result.text
print(html)
Using Various Proxies
Typically the best proxy type for bypassing tough anti-bot measures is mobile, then residential, then ISP, and lastly our default proxies.
We recommend trying requests with the default to start and working your way up as necessary, as non-default proxies incur additional costs (costs are: 1 request for default proxies, 5 requests for ISP proxies, 25 for residential, and 50 for mobile).
from getscraping import GetScrapingClient
from getscraping.models import GetScrapingParams
client = GetScrapingClient('YOUR_API_KEY')
result = client.scrape(GetScrapingParams(
url='https://example.com',
method='GET',
use_residential_proxy=True
))
html = result.text
print(html)
Retrying Requests
from getscraping import GetScrapingClient
from getscraping.models import GetScrapingParams, RetryConfig
client = GetScrapingClient('YOUR_API_KEY')
result = client.scrape(GetScrapingParams(
url='https://example.com',
method='GET',
retry_config=RetryConfig(
num_retries=3,
success_status_codes=[200]
)
))
html = result.text
print(html)
Using Programmable Browser Actions
from getscraping import GetScrapingClient
from getscraping.models import GetScrapingParams, JavascriptRenderingOptions, ProgrammableBrowserOptions, ProgrammableBrowserAction
client = GetScrapingClient('YOUR_API_KEY')
result = client.scrape(GetScrapingParams(
url='https://example.com',
method='GET',
js_rendering_options=JavascriptRenderingOptions(
render_js=True,
programmable_browser=ProgrammableBrowserOptions(
actions=[
ProgrammableBrowserAction(
type='click',
selector='#submit-button'
),
ProgrammableBrowserAction(
type='wait',
wait_millis=2000
)
]
)
)
))
html = result.text
print(html)
Advanced Usage
For more advanced usage, including intercepting requests and other programmable browser actions, please refer to the GetScraping documentation.
Support
If you encounter any issues or have questions, please send us a message support@getscraping.com or open an issue in the GitHub repository.
License
This project is licensed under the MIT License. See the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file getscraping-1.0.0.tar.gz
.
File metadata
- Download URL: getscraping-1.0.0.tar.gz
- Upload date:
- Size: 41.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.8.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c97dbb1c4e50e2a1affe8387221406d4a7211f2343eb700cba9bb177e682bbbe |
|
MD5 | a7b1f59429f40ae4d654aa9c30d2c2b6 |
|
BLAKE2b-256 | a6aca03d2380c8a3583c8ce637286d504f6dabbf7fa5787e0ca477e976aa9509 |
Provenance
File details
Details for the file getscraping-1.0.0-py3-none-any.whl
.
File metadata
- Download URL: getscraping-1.0.0-py3-none-any.whl
- Upload date:
- Size: 29.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.8.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | be61c4665df8e198d3056b51ca21e575de626dbeeb3873236677824423b969e8 |
|
MD5 | 5dd9b32238c51000101fdb7d6f0513b4 |
|
BLAKE2b-256 | f73c99151e0d58a404dc239f0f483a981892d244c28909ce4f64fc6359a13393 |