Skip to main content

Official Python client for the WebScraping.AI API.

Project description

webscraping_ai

Official Python client for the WebScraping.AI API.

4.0 is a hard break from 3.x. See CHANGELOG.md for the migration notes. If you cannot update your call sites yet, stay on webscraping_ai == 3.2.1.

Install

pip install webscraping_ai

Requires Python 3.9 or newer.

Quick start

from webscraping_ai import Client

client = Client(api_key="YOUR_API_KEY")

# Page HTML
html = client.html("https://example.com")

# Visible text, optionally as a structured JSON response
text = client.text("https://example.com", text_format="json", return_links=True)

# CSS-selected HTML
heading = client.selected("https://example.com", selector="h1")
multiple = client.selected_multiple("https://example.com", selectors=["h1", "p"])

# LLM-powered helpers
answer = client.question("https://example.com", question="What is the page title?")
fields = client.fields(
    "https://example.com",
    fields={"title": "Main product title", "price": "Current product price"},
)

# Account quota
info = client.account()

The client is also a context manager, which closes the underlying connection pool on exit:

with Client(api_key="...") as client:
    client.html("https://example.com")

Async usage

AsyncClient mirrors Client but uses async def methods backed by httpx.AsyncClient:

import asyncio
from webscraping_ai import AsyncClient

async def main():
    async with AsyncClient(api_key="YOUR_API_KEY") as client:
        html = await client.html("https://example.com")
        print(html)

asyncio.run(main())

Error handling

Every non-2xx response is mapped to a typed exception so you can except on the situation you actually care about rather than parsing status codes:

from webscraping_ai import (
    Client,
    AuthenticationError,
    RateLimitError,
    PaymentRequiredError,
    APITimeoutError,
    APIConnectionError,
)

client = Client(api_key="YOUR_API_KEY")

try:
    client.html("https://example.com")
except AuthenticationError:
    ...  # 403 — wrong or missing API key
except PaymentRequiredError:
    ...  # 402 — out of credits
except RateLimitError:
    ...  # 429 — too many concurrent requests
except APITimeoutError:
    ...  # request did not complete in time
except APIConnectionError:
    ...  # transport-level failure

All exceptions inherit from WebScrapingAIError, so you can catch everything the client raises with a single except if you prefer. API errors expose the parsed error envelope (message, status, status_code, status_message, body, response_body).

Endpoint reference

Method HTTP route Returns
client.html(...) GET /html str (page HTML)
client.text(...) GET /text str or dict (JSON)
client.selected(...) GET /selected str
client.selected_multiple(...) GET /selected-multiple list
client.question(...) GET /ai/question str
client.fields(...) GET /ai/fields dict (wrapped under result)
client.account() GET /account dict

Every page-fetch method accepts the full set of API parameters as keyword arguments: headers, timeout, js, js_timeout, wait_for, proxy, country, custom_proxy, device, error_on_404, error_on_redirect, js_script, plus the per-endpoint extras (return_script_result, format, text_format, return_links, selector, selectors, question, fields). See the API documentation for the full parameter reference.

API response-shape notes

Two endpoints return shapes that differ from the OpenAPI spec examples. The client returns the raw response unchanged, so:

  • /ai/fields wraps the extracted fields under a result key: {"result": {"title": "...", "price": "..."}}.
  • /selected-multiple returns list[list[str]], not a flat list[str].

Development

mise install                    # or use python 3.13 from any source
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
pytest
ruff check .
mypy src/webscraping_ai

License

MIT.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

webscraping_ai-4.0.0.tar.gz (12.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

webscraping_ai-4.0.0-py3-none-any.whl (13.0 kB view details)

Uploaded Python 3

File details

Details for the file webscraping_ai-4.0.0.tar.gz.

File metadata

  • Download URL: webscraping_ai-4.0.0.tar.gz
  • Upload date:
  • Size: 12.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for webscraping_ai-4.0.0.tar.gz
Algorithm Hash digest
SHA256 d041f25162e15b3663528ab0f70054255fe53e160c03957072b9577df42d7c71
MD5 9862675fee2351763c736867582185c7
BLAKE2b-256 9a2699326e88303214c10ad6ee7056724803e0cb1e490f146ef24ac85b1fcde7

See more details on using hashes here.

Provenance

The following attestation bundles were made for webscraping_ai-4.0.0.tar.gz:

Publisher: release.yml on webscraping-ai/webscraping-ai-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file webscraping_ai-4.0.0-py3-none-any.whl.

File metadata

  • Download URL: webscraping_ai-4.0.0-py3-none-any.whl
  • Upload date:
  • Size: 13.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for webscraping_ai-4.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 91e3da381221d696694bde0f5359e48e7303b8e04677a5aa6e71e7892a926812
MD5 38198cb1cbd2f197b68aa2b2e29aff92
BLAKE2b-256 e2ffe0e2e1d3aab3d1be4d66924856bfa2b5adb6fcb7b41e806c19c2264c3fb5

See more details on using hashes here.

Provenance

The following attestation bundles were made for webscraping_ai-4.0.0-py3-none-any.whl:

Publisher: release.yml on webscraping-ai/webscraping-ai-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page