Official Python SDK for the WebPeel web-scraping API

These details have not been verified by PyPI

Project links

Project description

WebPeel Python SDK

Official Python client for the WebPeel API — open-source web scraping with headless browsers, AI extraction, crawling, and more.

Installation

pip install webpeel

Quick Start

from webpeel import WebPeel

# Initialise — reads WEBPEEL_API_KEY and WEBPEEL_BASE_URL env vars by default
wp = WebPeel(api_key="wp-xxx")

# Scrape a URL → ScrapeResult
result = wp.scrape("https://example.com")
print(result.title)       # "Example Domain"
print(result.content)     # Markdown content
print(result.links)       # ["https://..."]

Authentication

import os

# Option 1: explicit
wp = WebPeel(api_key="wp-xxx")

# Option 2: environment variables (recommended)
# export WEBPEEL_API_KEY=wp-xxx
# export WEBPEEL_BASE_URL=https://api.webpeel.dev   # optional, this is the default
wp = WebPeel()

Self-Hosted

Point the client at your own WebPeel server:

wp = WebPeel(api_key="wp-local", base_url="http://localhost:3000")

API Reference

`scrape(url, **options)` → `ScrapeResult`

Scrape a single URL and return its content.

result = wp.scrape("https://example.com")
# Basic options
result = wp.scrape(
    "https://example.com",
    format="html",          # "markdown" (default) | "text" | "html"
    render=True,            # Use headless browser for JS-heavy sites
    wait=2000,              # Wait 2 s after page load (requires render=True)
    actions=[               # Browser actions to perform before capture
        {"type": "click", "selector": ".cookie-banner .close"},
        {"type": "wait", "ms": 500},
    ],
    include_tags=["article"],   # Only include these CSS selectors
    exclude_tags=[".ads"],       # Strip these selectors
    images=True,                 # Include image URLs
)

print(result.content)     # str — scraped content
print(result.title)       # str — page title
print(result.url)         # str — final URL (after redirects)
print(result.metadata)    # dict — og tags, description, etc.
print(result.links)       # list[str] — hyperlinks
print(result.elapsed)     # int — ms taken

`search(query, **options)` → `SearchResponse`

Search the web (powered by DuckDuckGo by default).

resp = wp.search("python web scraping", max_results=5)

for r in resp.results:
    print(r.title, r.url, r.snippet)

# Also scrape each result page
resp = wp.search("best open-source tools", max_results=3, scrape_results=True)
for r in resp.results:
    print(r.content)   # Full markdown of each result page

`batch(urls, **options)` → `BatchJob`

Submit multiple URLs for async scraping.

# Submit the job
job = wp.batch(["https://a.com", "https://b.com", "https://c.com"])
print(job.id)  # "batch-abc123"

# Poll until done
import time
while True:
    status = wp.batch_status(job.id)
    print(f"{status.completed}/{status.total} pages done")
    if status.status in ("completed", "failed"):
        break
    time.sleep(2)

# Access results
for page in (status.data or []):
    print(page.title, page.url)

`crawl(url, **options)` → `CrawlJob`

Crawl an entire domain (async job).

# Start the crawl
job = wp.crawl("https://docs.example.com", max_pages=50, max_depth=3)

# Check status
status = wp.crawl_status(job.id)
print(status.status)      # "processing" | "completed" | "failed"
print(status.completed)   # pages scraped so far
print(status.total)       # total pages discovered

# Results when done
for page in (status.data or []):
    print(page.url, page.title)

`map(url, **options)` → `MapResult`

Discover all URLs on a domain without scraping content.

result = wp.map("https://example.com")
print(result.urls)   # ["https://example.com/", "https://example.com/about", ...]

# Filter URLs by keyword
result = wp.map("https://example.com", search="pricing", limit=100)

`extract(urls, prompt, schema, **options)` → `ExtractResult`

Extract structured data from a URL using an LLM.

result = wp.extract(
    ["https://shop.example.com/product"],
    prompt="Extract the product name, price, and availability",
    schema={
        "type": "object",
        "properties": {
            "name":         {"type": "string"},
            "price":        {"type": "string"},
            "availability": {"type": "string"},
        },
    },
    llm_api_key="sk-...",   # BYOK — falls back to server OPENAI_API_KEY
    model="gpt-4o-mini",
)

print(result.data)        # {"name": "Widget Pro", "price": "$49", ...}
print(result.metadata)    # {"tokensUsed": {...}, "cost": 0.0002, ...}

`screenshot(url, **options)` → `bytes`

Take a screenshot and get back raw image bytes.

# Returns raw PNG bytes
png = wp.screenshot("https://example.com")
with open("screenshot.png", "wb") as f:
    f.write(png)

# Full-page JPEG
jpg = wp.screenshot(
    "https://example.com",
    full_page=True,
    width=1440,
    height=900,
    format="jpeg",
    quality=85,
    wait_for=1000,   # ms to wait after load
)

`research(query, **options)` → `ResearchResult`

Research a topic by combining search + scraping.

result = wp.research("best Python web scraping tools 2025", max_sources=5)

print(result.report)   # Markdown report with sourced content

for src in result.sources:
    print(src.title, src.url)
    print(src.content[:500])  # Full scraped content

Async Support

Every method has an async_* counterpart for use with asyncio:

import asyncio
from webpeel import WebPeel

wp = WebPeel(api_key="wp-xxx")

async def main():
    # All async equivalents
    result   = await wp.async_scrape("https://example.com")
    resp     = await wp.async_search("python scraping")
    job      = await wp.async_crawl("https://example.com", max_pages=10)
    status   = await wp.async_crawl_status(job.id)
    batch    = await wp.async_batch(["https://a.com", "https://b.com"])
    map_res  = await wp.async_map("https://example.com")
    extract  = await wp.async_extract(["https://example.com"], prompt="Get title")
    png      = await wp.async_screenshot("https://example.com")
    research = await wp.async_research("web scraping 2025")

asyncio.run(main())

Error Handling

All API errors raise WebPeelError:

from webpeel import WebPeel, WebPeelError

wp = WebPeel(api_key="wp-xxx")

try:
    result = wp.scrape("https://blocked-site.com")
except WebPeelError as e:
    print(e.status_code)   # 403
    print(e.error_code)    # "BLOCKED"
    print(str(e))          # Human-readable message

Running Tests

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

License

MIT — see LICENSE for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.13.0

Mar 4, 2026

This version

0.12.0

Mar 1, 2026

0.1.0

Feb 14, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

webpeel-0.12.0.tar.gz (14.0 kB view details)

Uploaded Mar 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

webpeel-0.12.0-py3-none-any.whl (12.4 kB view details)

Uploaded Mar 1, 2026 Python 3

File details

Details for the file webpeel-0.12.0.tar.gz.

File metadata

Download URL: webpeel-0.12.0.tar.gz
Upload date: Mar 1, 2026
Size: 14.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for webpeel-0.12.0.tar.gz
Algorithm	Hash digest
SHA256	`3ea1c54373baa72f0957ce57842978c3cd6a6a61cddb60d6deab067d343a9d55`
MD5	`8a49c3d012f470f5c5984047d6d4b98b`
BLAKE2b-256	`7157f8057665c299dc68c322157535246b88456f327051fa1ea3a1ae33a59481`

See more details on using hashes here.

File details

Details for the file webpeel-0.12.0-py3-none-any.whl.

File metadata

Download URL: webpeel-0.12.0-py3-none-any.whl
Upload date: Mar 1, 2026
Size: 12.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for webpeel-0.12.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`cbb58edef585960aaf1b3b314c1a57f8ac0415e8483b99afde61f271a3bd71d2`
MD5	`bee3de43c46da0caee1ee84985112783`
BLAKE2b-256	`f75cbbaf93a8cd60e9015ae33055279043c0bf5c7d75988523789f935aa0bc50`

See more details on using hashes here.

webpeel 0.12.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

WebPeel Python SDK

Installation

Quick Start

Authentication

Self-Hosted

API Reference

scrape(url, **options) → ScrapeResult

search(query, **options) → SearchResponse

batch(urls, **options) → BatchJob

crawl(url, **options) → CrawlJob

map(url, **options) → MapResult

extract(urls, prompt, schema, **options) → ExtractResult

screenshot(url, **options) → bytes

research(query, **options) → ResearchResult

Async Support

Error Handling

Running Tests

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`scrape(url, **options)` → `ScrapeResult`

`search(query, **options)` → `SearchResponse`

`batch(urls, **options)` → `BatchJob`

`crawl(url, **options)` → `CrawlJob`

`map(url, **options)` → `MapResult`

`extract(urls, prompt, schema, **options)` → `ExtractResult`

`screenshot(url, **options)` → `bytes`

`research(query, **options)` → `ResearchResult`