Official Python SDK for crawlbrulee - web-scraping API.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

catalin-cwbl

These details have not been verified by PyPI

Project links

Homepage

Project description

crawlbrulee

The official Python SDK for the crawlbrulee web-scraping API.

Hand-written, fully typed (ships py.typed).
Sync and async clients (Crawlbrulee / AsyncCrawlbrulee).
One runtime dependency: httpx.
Python 3.10+.

Status: v0.1.0 (beta). The API surface is stabilizing — expect minor breaking changes between 0.x releases.

Install

pip install crawlbrulee
# or: uv add crawlbrulee

Quickstart

from crawlbrulee import Crawlbrulee, ScrapeExtract

client = Crawlbrulee(api_key="cble_…")
# or read CRAWLBRULEE_API_KEY from the environment:
client = Crawlbrulee.from_env()

page = client.scrape(
    url="https://example.com",
    extract=ScrapeExtract(markdown=True, links=True),
)

print(page.markdown)
print(len(page.links or []), "links found")

Async

The async client mirrors the sync one method-for-method:

import asyncio
from crawlbrulee import AsyncCrawlbrulee

async def main() -> None:
    async with AsyncCrawlbrulee.from_env() as client:
        page = await client.scrape(url="https://example.com")
        print(page.markdown)

asyncio.run(main())

Configuration

Option	Default	Description
`api_key`	—	Sent as `Authorization: Bearer …`. Required — or use `from_env()`.
`base_url`	`https://api.crawlbrulee.com`	Override the target host (local dev / staging). Trailing slashes stripped.
`timeout`	`None` (no timeout)	Per-request timeout in seconds. A per-call `timeout=` overrides it.

Crawlbrulee.from_env(**overrides) reads the key from CRAWLBRULEE_API_KEY and forwards any other option through.

Both clients support context managers (with / async with) and expose close() / aclose() to release the connection pool.

Request inputs

Top-level request fields are plain keyword arguments. Nested structures are typed dataclasses (importable from crawlbrulee) — or plain dicts, if you prefer:

from crawlbrulee import ScrapeExtract, ScreenshotRequest

client.scrape(
    url="https://news.example.com/article-1",
    extract=ScrapeExtract(
        markdown=True,
        metadata=True,
        links=True,
        screenshot=ScreenshotRequest(type="full_page", device_mode="desktop"),
    ),
    require_js=True,
    proxy="advanced",
    exclude_selectors=["nav", "footer"],
    cache={"max_age": 3600},          # dataclass or dict, your call
    location={"country": "US"},
)

None-valued options are omitted from the request entirely, so the server's defaults apply.

API reference

Every method returns a typed dataclass and accepts a per-call timeout= (seconds).

Scraping

Method	Description
`scrape(url, **opts)`	Scrape a URL synchronously; blocks until done.
`scrape_async(url, **opts)`	Submit a background job; returns `{ job_id }` immediately.
`get_scrape_status(job_id)`	Current job state — `pending` / `running` / `done` / `failed`.
`get_scrape_result(job_id)`	Result of a completed job (raises if not finished).
`wait_for_scrape(job_id, interval=2.0, timeout=300.0)`	Poll until terminal, then return the result.

job = client.scrape_async(url="https://example.com")
page = client.wait_for_scrape(job.job_id, interval=2.0, timeout=300.0)

wait_for_scrape raises a CrawlbruleeError with error_name="job_failed" if the job fails, or error_name="request_timeout" if the wait expires (timeout=0 waits forever).

Mapping

result = client.map(
    url="https://example.com",
    sitemap_only=False,
    types={"internal": True, "external": False},
    max_urls=5_000,
    page=1,
    limit=1_000,
)
print(len(result.links), "of", result.meta.pagination.total_pages, "pages")

Account

Method	Description
`usage()`	Current billing-cycle snapshot — credits, quota %, concurrency, reset time.
`whoami()`	Organization + token identity behind the API key.

Errors

Every failure raised by the SDK subclasses CrawlbruleeError:

Class	When
`AuthenticationError`	401 / 403 (missing, invalid, or unauthorized key).
`RateLimitError`	429. Exposes `retry_after_ms` and `limited_by` when provided.
`UsageAllocationError`	Plan limit hit. Exposes `reason` and `usage`.
`ValidationError`	Bad request (`invalid_url`, `url_too_long`, `blocked_url`, …).
`NotFoundError`	404 (e.g. unknown async `job_id`).
`TransportError`	Network failure, timeout, or non-JSON response.
`CrawlbruleeError`	Base class — any other API error. Always has `status`, `error_name`, `message`.

import time
from crawlbrulee import Crawlbrulee, RateLimitError, UsageAllocationError

client = Crawlbrulee.from_env()
try:
    client.scrape(url="https://example.com")
except RateLimitError as err:
    time.sleep((err.retry_after_ms or 1000) / 1000)
    # retry…
except UsageAllocationError as err:
    print("Plan limit hit:", err.reason, err.usage)

For exhaustive branching, switch on err.error_name.

Notes on the wire format

The SDK mirrors the API's JSON shapes faithfully. The one exception: the async job status response uses camelCase on the wire (jobId, createdAt); the SDK exposes Pythonic job_id / created_at on AsyncJobStatusResponse.

Development

uv sync                 # or: pip install -e ".[dev]"
ruff check . && ruff format --check .
pyright
pytest

The SDK keeps a single runtime dependency (httpx) on purpose — please keep it that way when contributing.

License

AGPL-3.0-only

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

catalin-cwbl

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.1.1

May 29, 2026

0.1.0

May 28, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crawlbrulee-0.1.1.tar.gz (39.8 kB view details)

Uploaded May 29, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

crawlbrulee-0.1.1-py3-none-any.whl (36.2 kB view details)

Uploaded May 29, 2026 Python 3

File details

Details for the file crawlbrulee-0.1.1.tar.gz.

File metadata

Download URL: crawlbrulee-0.1.1.tar.gz
Upload date: May 29, 2026
Size: 39.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for crawlbrulee-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`77f7115652afd63a787886f9a264b62a809d41eafcb6a5a209f642696218e555`
MD5	`6f4c904d96c5afea158ec05fc334bee1`
BLAKE2b-256	`dc5bcb2c2865596873c77024ec15b08790b8cd597bc1d159e863350efca249e8`

See more details on using hashes here.

Provenance

The following attestation bundles were made for crawlbrulee-0.1.1.tar.gz:

Publisher: publish.yml on crawlbrulee/crawlbrulee-py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: crawlbrulee-0.1.1.tar.gz
- Subject digest: 77f7115652afd63a787886f9a264b62a809d41eafcb6a5a209f642696218e555
- Sigstore transparency entry: 1671551496
- Sigstore integration time: May 29, 2026
Source repository:
- Permalink: crawlbrulee/crawlbrulee-py@d307034a6ed8beb16f859e3fa659eb4e72a60dae
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/crawlbrulee
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@d307034a6ed8beb16f859e3fa659eb4e72a60dae
- Trigger Event: push

File details

Details for the file crawlbrulee-0.1.1-py3-none-any.whl.

File metadata

Download URL: crawlbrulee-0.1.1-py3-none-any.whl
Upload date: May 29, 2026
Size: 36.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for crawlbrulee-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b808324e849ee3fbc8c56680d8172b6287308c76abdf1c286bc48c532ad5e713`
MD5	`a659dd2c2a0018486267b739f6c51bb2`
BLAKE2b-256	`af1398bba877db02cada6a20800df3e35587c303c35f56ceeabe876fb723be55`

See more details on using hashes here.

Provenance

The following attestation bundles were made for crawlbrulee-0.1.1-py3-none-any.whl:

Publisher: publish.yml on crawlbrulee/crawlbrulee-py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: crawlbrulee-0.1.1-py3-none-any.whl
- Subject digest: b808324e849ee3fbc8c56680d8172b6287308c76abdf1c286bc48c532ad5e713
- Sigstore transparency entry: 1671551532
- Sigstore integration time: May 29, 2026
Source repository:
- Permalink: crawlbrulee/crawlbrulee-py@d307034a6ed8beb16f859e3fa659eb4e72a60dae
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/crawlbrulee
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@d307034a6ed8beb16f859e3fa659eb4e72a60dae
- Trigger Event: push

crawlbrulee 0.1.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

crawlbrulee

Install

Quickstart

Async

Configuration

Request inputs

API reference

Scraping

Mapping

Account

Errors

Notes on the wire format

Development

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance