Official Python SDK for the ScrAPI web scraping service.
Project description
ScrAPI SDK for Python
ScrAPI is your ultimate web scraping solution, offering powerful, reliable, and easy-to-use features to extract data from any website effortlessly.
Official Python SDK for the ScrAPI web scraping service.
- Website: https://scrapi.tech
- API docs: https://scrapi.tech/docs
- Source repository: https://github.com/DevEnterpriseSoftware/scrapi-sdk-python
Table of contents
- Installation
- Quick start (sync)
- Quick start (async)
- Scrape request options
- Browser commands
- Scrape response data
- Scrape request defaults
- Lookups
- Exceptions
- HTML helper utilities (optional)
- Sample app
- Development
- Build and publish
Installation
pip install scrapi-sdk
Install optional HTML helpers:
pip install "scrapi-sdk[html]"
Quick start (sync)
from scrapi_sdk import ScrapeRequest, ScrapiClient
with ScrapiClient("YOUR_API_KEY") as client:
response = client.scrape(ScrapeRequest("https://deventerprise.com"))
print(response.content if response else "No response")
Quick start (async)
import asyncio
from scrapi_sdk import AsyncScrapiClient
async def main() -> None:
async with AsyncScrapiClient("YOUR_API_KEY") as client:
response = await client.scrape("https://deventerprise.com")
print(response.content if response else "No response")
asyncio.run(main())
Scrape request options
All options map to ScrAPI API fields while exposing Pythonic snake_case names.
| Python field | Type | Description |
|---|---|---|
url |
str |
URL to scrape. Relative inputs are normalized to https://.... |
response_format |
ResponseFormat |
Must be ResponseFormat.JSON when using this SDK client. |
response_selector |
str | None |
CSS/XPath selector for response filtering. |
cookies |
dict[str, str] |
Cookies sent to target request. |
headers |
dict[str, str] |
Headers sent to target request. |
request_method |
str |
HTTP method override (default GET). |
request_body_base64 |
str | None |
Base64 request payload. |
proxy_type |
ProxyType |
NONE, FREE, RESIDENTIAL, DATACENTER, TOR, CUSTOM. |
proxy_country |
str | None |
Three-letter country code, e.g. USA. |
proxy_city |
str | None |
City key (requires proxy_country). |
custom_proxy_url |
str | None |
Custom proxy URL. |
use_browser |
bool |
Enable browser mode. |
solve_captchas |
bool |
Auto solve captchas (browser mode only). |
include_screenshot |
bool |
Include screenshot URL in response (browser mode only). |
include_pdf |
bool |
Include PDF URL in response (browser mode only). |
include_video |
bool |
Include video URL in response (browser mode only). |
accept_dialogs |
bool |
Accept browser dialogs/popups. |
session_id |
str | None |
Reuse session context across calls. |
callback_url |
str | None |
Webhook URL called when scrape completes. |
browser_commands |
BrowserCommandList |
Ordered browser action commands. |
Example:
from scrapi_sdk import ProxyType, ResponseFormat, ScrapeRequest
request = ScrapeRequest("https://deventerprise.com")
request.proxy_type = ProxyType.RESIDENTIAL
request.proxy_country = "USA"
request.use_browser = True
request.solve_captchas = True
request.include_screenshot = True
request.response_format = ResponseFormat.JSON
Browser commands
When use_browser=True, chain browser commands with BrowserCommandList:
from scrapi_sdk import ScrapeRequest
request = ScrapeRequest("https://www.roboform.com/filling-test-all-fields")
request.use_browser = True
request.accept_dialogs = True
request.browser_commands \
.input("input[name='01___title']", "Mr") \
.input("input[name='02frstname']", "Werner") \
.input("input[name='04lastname']", "van Deventer") \
.select("select[name='40cc__type']", "Discover") \
.wait(3000) \
.wait_for("input[type='reset']") \
.click("input[type='reset']") \
.wait(1000) \
.scroll(1000) \
.evaluate("console.log('any valid code...')")
Scrape response data
ScrapeResponse includes all API response details.
response = client.scrape("https://deventerprise.com")
if response:
print(response.request_url)
print(response.response_url)
print(response.duration)
print(response.attempts)
print(response.credits_used)
print(response.status_code)
print(response.screenshot_url)
print(response.pdf_url)
print(response.video_url)
print(response.content)
print(response.content_hash) # SHA1 of UTF-16LE content to match .NET SDK parity.
for captcha_name, solved_count in response.captchas_solved.items():
print(f"{captcha_name}: {solved_count}")
for key, value in response.headers.items():
print(f"{key}: {value}")
for key, value in response.cookies.items():
print(f"{key}: {value}")
for message in response.error_messages or []:
print(message)
If beautifulsoup4 is installed, response.html returns a parsed BeautifulSoup object.
Scrape request defaults
ScrapeRequestDefaults applies defaults to every new ScrapeRequest.
from scrapi_sdk import ProxyType, ScrapeRequest, ScrapeRequestDefaults
ScrapeRequestDefaults.proxy_type = ProxyType.RESIDENTIAL
ScrapeRequestDefaults.use_browser = True
ScrapeRequestDefaults.solve_captchas = True
ScrapeRequestDefaults.headers["Sample"] = "Custom-Value"
request = ScrapeRequest("https://deventerprise.com")
request.proxy_type = ProxyType.TOR # explicit override
assert request.proxy_type == ProxyType.TOR
assert request.use_browser is True
assert request.solve_captchas is True
assert request.headers["Sample"] == "Custom-Value"
Lookups
Credit balance
balance = client.get_credit_balance()
print(balance)
Supported countries
countries = client.get_supported_countries()
for country in countries:
print(country.key, country.name, country.proxy_count)
Supported cities
cities = client.get_supported_cities("USA")
for city in cities:
print(city.key, city.name, city.proxy_count)
Exceptions
Any client/API errors are raised as ScrapiException with HTTP status code details.
from scrapi_sdk import ScrapeRequest, ScrapiClient, ScrapiException
with ScrapiClient("YOUR_API_KEY") as client:
try:
response = client.scrape(ScrapeRequest("https://deventerprise.com"))
except ScrapiException as ex:
print(f"Error ({ex.status_code}): {ex}")
raise
HTML helper utilities (optional)
Install optional dependency first:
pip install "scrapi-sdk[html]"
Helpers exported from scrapi_sdk:
numbers_only(text, include_decimal_points=False, trim=True)html_with_no_script(html)next_element(node)is_visible(node, check_parent_nodes=True)
Example:
from scrapi_sdk import html_with_no_script, numbers_only
print(numbers_only("USD 1,299.95", include_decimal_points=True))
print(html_with_no_script("<p>safe</p><script>alert(1)</script>"))
Sample app
A runnable sample app is included at examples/basic_scrape/main.py.
It reads SCRAPI_API_KEY and scrapes https://deventerprise.com.
Development
python -m venv .venv
. .venv/Scripts/activate # Windows PowerShell: .venv\Scripts\Activate.ps1
pip install -e .[dev,html]
pytest
Build and publish
Local build
python -m pip install --upgrade pip build twine
python -m build
python -m twine check dist/*
Upload to TestPyPI
# PowerShell
$env:TWINE_USERNAME="__token__"
$env:TWINE_PASSWORD="pypi-..."
python -m twine upload -r testpypi dist/*
Upload to PyPI
# PowerShell
$env:TWINE_USERNAME="__token__"
$env:TWINE_PASSWORD="pypi-..."
python -m twine upload dist/*
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file scrapi_sdk-1.0.1.tar.gz.
File metadata
- Download URL: scrapi_sdk-1.0.1.tar.gz
- Upload date:
- Size: 17.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bd4802fd9c2462a47ee4e16057b6ec80527a3a23efb87d2021179480d965f5b7
|
|
| MD5 |
876ed706acb5a16e44fa2353908f47a4
|
|
| BLAKE2b-256 |
3e32ab2b38c3f9e664f2edcd639c14a771375c97589c4f80be96ddcc14ff0752
|
Provenance
The following attestation bundles were made for scrapi_sdk-1.0.1.tar.gz:
Publisher:
publish.yml on DevEnterpriseSoftware/scrapi-sdk-python
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
scrapi_sdk-1.0.1.tar.gz -
Subject digest:
bd4802fd9c2462a47ee4e16057b6ec80527a3a23efb87d2021179480d965f5b7 - Sigstore transparency entry: 1602562076
- Sigstore integration time:
-
Permalink:
DevEnterpriseSoftware/scrapi-sdk-python@f4b23f7060f68eb17510f8c505ddd8289b412d04 -
Branch / Tag:
refs/heads/master - Owner: https://github.com/DevEnterpriseSoftware
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@f4b23f7060f68eb17510f8c505ddd8289b412d04 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file scrapi_sdk-1.0.1-py3-none-any.whl.
File metadata
- Download URL: scrapi_sdk-1.0.1-py3-none-any.whl
- Upload date:
- Size: 14.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
357004ed5246004cb6c97cd8d673c00fc93b562f4114e85dd8e9d42e134b0da2
|
|
| MD5 |
7ced6def860582cb4ec6dbb6819268ba
|
|
| BLAKE2b-256 |
b271bfe3bb6de1b0809ae9f9098c754e965de0517fd1e6493cdb9fd21f35515c
|
Provenance
The following attestation bundles were made for scrapi_sdk-1.0.1-py3-none-any.whl:
Publisher:
publish.yml on DevEnterpriseSoftware/scrapi-sdk-python
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
scrapi_sdk-1.0.1-py3-none-any.whl -
Subject digest:
357004ed5246004cb6c97cd8d673c00fc93b562f4114e85dd8e9d42e134b0da2 - Sigstore transparency entry: 1602562195
- Sigstore integration time:
-
Permalink:
DevEnterpriseSoftware/scrapi-sdk-python@f4b23f7060f68eb17510f8c505ddd8289b412d04 -
Branch / Tag:
refs/heads/master - Owner: https://github.com/DevEnterpriseSoftware
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@f4b23f7060f68eb17510f8c505ddd8289b412d04 -
Trigger Event:
workflow_dispatch
-
Statement type: