Official Python client for the WebScraping.AI API.
Project description
webscraping_ai
Official Python client for the WebScraping.AI API.
4.0 is a hard break from 3.x. See CHANGELOG.md for the migration notes. If you cannot update your call sites yet, stay on
webscraping_ai == 3.2.1.
Install
pip install webscraping_ai
Requires Python 3.9 or newer.
Quick start
from webscraping_ai import Client
client = Client(api_key="YOUR_API_KEY")
# Page HTML
html = client.html("https://example.com")
# Visible text, optionally as a structured JSON response
text = client.text("https://example.com", text_format="json", return_links=True)
# CSS-selected HTML
heading = client.selected("https://example.com", selector="h1")
multiple = client.selected_multiple("https://example.com", selectors=["h1", "p"])
# LLM-powered helpers
answer = client.question("https://example.com", question="What is the page title?")
fields = client.fields(
"https://example.com",
fields={"title": "Main product title", "price": "Current product price"},
)
# Account quota
info = client.account()
The client is also a context manager, which closes the underlying connection pool on exit:
with Client(api_key="...") as client:
client.html("https://example.com")
Async usage
AsyncClient mirrors Client but uses async def methods backed by
httpx.AsyncClient:
import asyncio
from webscraping_ai import AsyncClient
async def main():
async with AsyncClient(api_key="YOUR_API_KEY") as client:
html = await client.html("https://example.com")
print(html)
asyncio.run(main())
Error handling
Every non-2xx response is mapped to a typed exception so you can except on
the situation you actually care about rather than parsing status codes:
from webscraping_ai import (
Client,
AuthenticationError,
RateLimitError,
PaymentRequiredError,
APITimeoutError,
APIConnectionError,
)
client = Client(api_key="YOUR_API_KEY")
try:
client.html("https://example.com")
except AuthenticationError:
... # 403 — wrong or missing API key
except PaymentRequiredError:
... # 402 — out of credits
except RateLimitError:
... # 429 — too many concurrent requests
except APITimeoutError:
... # request did not complete in time
except APIConnectionError:
... # transport-level failure
All exceptions inherit from WebScrapingAIError, so you can catch everything
the client raises with a single except if you prefer. API errors expose the
parsed error envelope (message, status, status_code, status_message,
body, response_body).
Endpoint reference
| Method | HTTP route | Returns |
|---|---|---|
client.html(...) |
GET /html |
str (page HTML) |
client.text(...) |
GET /text |
str or dict (JSON) |
client.selected(...) |
GET /selected |
str |
client.selected_multiple(...) |
GET /selected-multiple |
list |
client.question(...) |
GET /ai/question |
str |
client.fields(...) |
GET /ai/fields |
dict (wrapped under result) |
client.account() |
GET /account |
dict |
Every page-fetch method accepts the full set of API parameters as keyword
arguments: headers, timeout, js, js_timeout, wait_for, proxy,
country, custom_proxy, device, error_on_404, error_on_redirect,
js_script, plus the per-endpoint extras (return_script_result, format,
text_format, return_links, selector, selectors, question, fields).
See the API documentation for the full
parameter reference.
API response-shape notes
Two endpoints return shapes that differ from the OpenAPI spec examples. The client returns the raw response unchanged, so:
/ai/fieldswraps the extracted fields under aresultkey:{"result": {"title": "...", "price": "..."}}./selected-multiplereturnslist[list[str]], not a flatlist[str].
Development
mise install # or use python 3.13 from any source
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
pytest
ruff check .
mypy src/webscraping_ai
License
MIT.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file webscraping_ai-4.0.0.tar.gz.
File metadata
- Download URL: webscraping_ai-4.0.0.tar.gz
- Upload date:
- Size: 12.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d041f25162e15b3663528ab0f70054255fe53e160c03957072b9577df42d7c71
|
|
| MD5 |
9862675fee2351763c736867582185c7
|
|
| BLAKE2b-256 |
9a2699326e88303214c10ad6ee7056724803e0cb1e490f146ef24ac85b1fcde7
|
Provenance
The following attestation bundles were made for webscraping_ai-4.0.0.tar.gz:
Publisher:
release.yml on webscraping-ai/webscraping-ai-python
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
webscraping_ai-4.0.0.tar.gz -
Subject digest:
d041f25162e15b3663528ab0f70054255fe53e160c03957072b9577df42d7c71 - Sigstore transparency entry: 1516874464
- Sigstore integration time:
-
Permalink:
webscraping-ai/webscraping-ai-python@a284c2bf7f3cf37a8eef24b15798f426f849ed4a -
Branch / Tag:
refs/tags/v4.0.0 - Owner: https://github.com/webscraping-ai
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@a284c2bf7f3cf37a8eef24b15798f426f849ed4a -
Trigger Event:
release
-
Statement type:
File details
Details for the file webscraping_ai-4.0.0-py3-none-any.whl.
File metadata
- Download URL: webscraping_ai-4.0.0-py3-none-any.whl
- Upload date:
- Size: 13.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
91e3da381221d696694bde0f5359e48e7303b8e04677a5aa6e71e7892a926812
|
|
| MD5 |
38198cb1cbd2f197b68aa2b2e29aff92
|
|
| BLAKE2b-256 |
e2ffe0e2e1d3aab3d1be4d66924856bfa2b5adb6fcb7b41e806c19c2264c3fb5
|
Provenance
The following attestation bundles were made for webscraping_ai-4.0.0-py3-none-any.whl:
Publisher:
release.yml on webscraping-ai/webscraping-ai-python
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
webscraping_ai-4.0.0-py3-none-any.whl -
Subject digest:
91e3da381221d696694bde0f5359e48e7303b8e04677a5aa6e71e7892a926812 - Sigstore transparency entry: 1516874505
- Sigstore integration time:
-
Permalink:
webscraping-ai/webscraping-ai-python@a284c2bf7f3cf37a8eef24b15798f426f849ed4a -
Branch / Tag:
refs/tags/v4.0.0 - Owner: https://github.com/webscraping-ai
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@a284c2bf7f3cf37a8eef24b15798f426f849ed4a -
Trigger Event:
release
-
Statement type: