Skip to main content

Official Python SDK for the ScrAPI web scraping service.

Project description

ScrAPI logo

ScrAPI SDK for Python

License: MIT PyPI

ScrAPI is your ultimate web scraping solution, offering powerful, reliable, and easy-to-use features to extract data from any website effortlessly.

Official Python SDK for the ScrAPI web scraping service.

Table of contents

Installation

pip install scrapi-sdk

Install optional HTML helpers:

pip install "scrapi-sdk[html]"

Quick start (sync)

from scrapi_sdk import ScrapeRequest, ScrapiClient

with ScrapiClient("YOUR_API_KEY") as client:
    response = client.scrape(ScrapeRequest("https://deventerprise.com"))
    print(response.content if response else "No response")

Quick start (async)

import asyncio
from scrapi_sdk import AsyncScrapiClient


async def main() -> None:
    async with AsyncScrapiClient("YOUR_API_KEY") as client:
        response = await client.scrape("https://deventerprise.com")
        print(response.content if response else "No response")


asyncio.run(main())

Scrape request options

All options map to ScrAPI API fields while exposing Pythonic snake_case names.

Python field Type Description
url str URL to scrape. Relative inputs are normalized to https://....
response_format ResponseFormat Must be ResponseFormat.JSON when using this SDK client.
response_selector str | None CSS/XPath selector for response filtering.
cookies dict[str, str] Cookies sent to target request.
headers dict[str, str] Headers sent to target request.
request_method str HTTP method override (default GET).
request_body_base64 str | None Base64 request payload.
proxy_type ProxyType NONE, FREE, RESIDENTIAL, DATACENTER, TOR, CUSTOM.
proxy_country str | None Three-letter country code, e.g. USA.
proxy_city str | None City key (requires proxy_country).
custom_proxy_url str | None Custom proxy URL.
use_browser bool Enable browser mode.
solve_captchas bool Auto solve captchas (browser mode only).
include_screenshot bool Include screenshot URL in response (browser mode only).
include_pdf bool Include PDF URL in response (browser mode only).
include_video bool Include video URL in response (browser mode only).
accept_dialogs bool Accept browser dialogs/popups.
session_id str | None Reuse session context across calls.
callback_url str | None Webhook URL called when scrape completes.
browser_commands BrowserCommandList Ordered browser action commands.

Example:

from scrapi_sdk import ProxyType, ResponseFormat, ScrapeRequest

request = ScrapeRequest("https://deventerprise.com")
request.proxy_type = ProxyType.RESIDENTIAL
request.proxy_country = "USA"
request.use_browser = True
request.solve_captchas = True
request.include_screenshot = True
request.response_format = ResponseFormat.JSON

Browser commands

When use_browser=True, chain browser commands with BrowserCommandList:

from scrapi_sdk import ScrapeRequest

request = ScrapeRequest("https://www.roboform.com/filling-test-all-fields")
request.use_browser = True
request.accept_dialogs = True

request.browser_commands \
    .input("input[name='01___title']", "Mr") \
    .input("input[name='02frstname']", "Werner") \
    .input("input[name='04lastname']", "van Deventer") \
    .select("select[name='40cc__type']", "Discover") \
    .wait(3000) \
    .wait_for("input[type='reset']") \
    .click("input[type='reset']") \
    .wait(1000) \
    .scroll(1000) \
    .evaluate("console.log('any valid code...')")

Scrape response data

ScrapeResponse includes all API response details.

response = client.scrape("https://deventerprise.com")

if response:
    print(response.request_url)
    print(response.response_url)
    print(response.duration)
    print(response.attempts)
    print(response.credits_used)
    print(response.status_code)
    print(response.screenshot_url)
    print(response.pdf_url)
    print(response.video_url)
    print(response.content)
    print(response.content_hash)  # SHA1 of UTF-16LE content to match .NET SDK parity.

    for captcha_name, solved_count in response.captchas_solved.items():
        print(f"{captcha_name}: {solved_count}")

    for key, value in response.headers.items():
        print(f"{key}: {value}")

    for key, value in response.cookies.items():
        print(f"{key}: {value}")

    for message in response.error_messages or []:
        print(message)

If beautifulsoup4 is installed, response.html returns a parsed BeautifulSoup object.

Scrape request defaults

ScrapeRequestDefaults applies defaults to every new ScrapeRequest.

from scrapi_sdk import ProxyType, ScrapeRequest, ScrapeRequestDefaults

ScrapeRequestDefaults.proxy_type = ProxyType.RESIDENTIAL
ScrapeRequestDefaults.use_browser = True
ScrapeRequestDefaults.solve_captchas = True
ScrapeRequestDefaults.headers["Sample"] = "Custom-Value"

request = ScrapeRequest("https://deventerprise.com")
request.proxy_type = ProxyType.TOR  # explicit override

assert request.proxy_type == ProxyType.TOR
assert request.use_browser is True
assert request.solve_captchas is True
assert request.headers["Sample"] == "Custom-Value"

Lookups

Credit balance

balance = client.get_credit_balance()
print(balance)

Supported countries

countries = client.get_supported_countries()
for country in countries:
    print(country.key, country.name, country.proxy_count)

Supported cities

cities = client.get_supported_cities("USA")
for city in cities:
    print(city.key, city.name, city.proxy_count)

Exceptions

Any client/API errors are raised as ScrapiException with HTTP status code details.

from scrapi_sdk import ScrapeRequest, ScrapiClient, ScrapiException

with ScrapiClient("YOUR_API_KEY") as client:
    try:
        response = client.scrape(ScrapeRequest("https://deventerprise.com"))
    except ScrapiException as ex:
        print(f"Error ({ex.status_code}): {ex}")
        raise

HTML helper utilities (optional)

Install optional dependency first:

pip install "scrapi-sdk[html]"

Helpers exported from scrapi_sdk:

  • numbers_only(text, include_decimal_points=False, trim=True)
  • html_with_no_script(html)
  • next_element(node)
  • is_visible(node, check_parent_nodes=True)

Example:

from scrapi_sdk import html_with_no_script, numbers_only

print(numbers_only("USD 1,299.95", include_decimal_points=True))
print(html_with_no_script("<p>safe</p><script>alert(1)</script>"))

Sample app

A runnable sample app is included at examples/basic_scrape/main.py.

It reads SCRAPI_API_KEY and scrapes https://deventerprise.com.

Development

python -m venv .venv
. .venv/Scripts/activate  # Windows PowerShell: .venv\Scripts\Activate.ps1
pip install -e .[dev,html]
pytest

Build and publish

Local build

python -m pip install --upgrade pip build twine
python -m build
python -m twine check dist/*

Upload to TestPyPI

# PowerShell
$env:TWINE_USERNAME="__token__"
$env:TWINE_PASSWORD="pypi-..."
python -m twine upload -r testpypi dist/*

Upload to PyPI

# PowerShell
$env:TWINE_USERNAME="__token__"
$env:TWINE_PASSWORD="pypi-..."
python -m twine upload dist/*

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapi_sdk-1.0.1.tar.gz (17.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scrapi_sdk-1.0.1-py3-none-any.whl (14.1 kB view details)

Uploaded Python 3

File details

Details for the file scrapi_sdk-1.0.1.tar.gz.

File metadata

  • Download URL: scrapi_sdk-1.0.1.tar.gz
  • Upload date:
  • Size: 17.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for scrapi_sdk-1.0.1.tar.gz
Algorithm Hash digest
SHA256 bd4802fd9c2462a47ee4e16057b6ec80527a3a23efb87d2021179480d965f5b7
MD5 876ed706acb5a16e44fa2353908f47a4
BLAKE2b-256 3e32ab2b38c3f9e664f2edcd639c14a771375c97589c4f80be96ddcc14ff0752

See more details on using hashes here.

Provenance

The following attestation bundles were made for scrapi_sdk-1.0.1.tar.gz:

Publisher: publish.yml on DevEnterpriseSoftware/scrapi-sdk-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file scrapi_sdk-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: scrapi_sdk-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 14.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for scrapi_sdk-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 357004ed5246004cb6c97cd8d673c00fc93b562f4114e85dd8e9d42e134b0da2
MD5 7ced6def860582cb4ec6dbb6819268ba
BLAKE2b-256 b271bfe3bb6de1b0809ae9f9098c754e965de0517fd1e6493cdb9fd21f35515c

See more details on using hashes here.

Provenance

The following attestation bundles were made for scrapi_sdk-1.0.1-py3-none-any.whl:

Publisher: publish.yml on DevEnterpriseSoftware/scrapi-sdk-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page