Skip to main content

Official Python SDK for the WebPeel web-scraping API

Project description

WebPeel Python SDK

Official Python client for the WebPeel API — open-source web scraping with headless browsers, AI extraction, crawling, and more.

PyPI version Python versions License: MIT

Installation

pip install webpeel

Quick Start

from webpeel import WebPeel

# Initialise — reads WEBPEEL_API_KEY and WEBPEEL_BASE_URL env vars by default
wp = WebPeel(api_key="wp-xxx")

# Scrape a URL → ScrapeResult
result = wp.scrape("https://example.com")
print(result.title)       # "Example Domain"
print(result.content)     # Markdown content
print(result.links)       # ["https://..."]

Authentication

import os

# Option 1: explicit
wp = WebPeel(api_key="wp-xxx")

# Option 2: environment variables (recommended)
# export WEBPEEL_API_KEY=wp-xxx
# export WEBPEEL_BASE_URL=https://api.webpeel.dev   # optional, this is the default
wp = WebPeel()

Self-Hosted

Point the client at your own WebPeel server:

wp = WebPeel(api_key="wp-local", base_url="http://localhost:3000")

API Reference

scrape(url, **options)ScrapeResult

Scrape a single URL and return its content.

result = wp.scrape("https://example.com")
# Basic options
result = wp.scrape(
    "https://example.com",
    format="html",          # "markdown" (default) | "text" | "html"
    render=True,            # Use headless browser for JS-heavy sites
    wait=2000,              # Wait 2 s after page load (requires render=True)
    actions=[               # Browser actions to perform before capture
        {"type": "click", "selector": ".cookie-banner .close"},
        {"type": "wait", "ms": 500},
    ],
    include_tags=["article"],   # Only include these CSS selectors
    exclude_tags=[".ads"],       # Strip these selectors
    images=True,                 # Include image URLs
)

print(result.content)     # str — scraped content
print(result.title)       # str — page title
print(result.url)         # str — final URL (after redirects)
print(result.metadata)    # dict — og tags, description, etc.
print(result.links)       # list[str] — hyperlinks
print(result.elapsed)     # int — ms taken

search(query, **options)SearchResponse

Search the web (powered by DuckDuckGo by default).

resp = wp.search("python web scraping", max_results=5)

for r in resp.results:
    print(r.title, r.url, r.snippet)

# Also scrape each result page
resp = wp.search("best open-source tools", max_results=3, scrape_results=True)
for r in resp.results:
    print(r.content)   # Full markdown of each result page

batch(urls, **options)BatchJob

Submit multiple URLs for async scraping.

# Submit the job
job = wp.batch(["https://a.com", "https://b.com", "https://c.com"])
print(job.id)  # "batch-abc123"

# Poll until done
import time
while True:
    status = wp.batch_status(job.id)
    print(f"{status.completed}/{status.total} pages done")
    if status.status in ("completed", "failed"):
        break
    time.sleep(2)

# Access results
for page in (status.data or []):
    print(page.title, page.url)

crawl(url, **options)CrawlJob

Crawl an entire domain (async job).

# Start the crawl
job = wp.crawl("https://docs.example.com", max_pages=50, max_depth=3)

# Check status
status = wp.crawl_status(job.id)
print(status.status)      # "processing" | "completed" | "failed"
print(status.completed)   # pages scraped so far
print(status.total)       # total pages discovered

# Results when done
for page in (status.data or []):
    print(page.url, page.title)

map(url, **options)MapResult

Discover all URLs on a domain without scraping content.

result = wp.map("https://example.com")
print(result.urls)   # ["https://example.com/", "https://example.com/about", ...]

# Filter URLs by keyword
result = wp.map("https://example.com", search="pricing", limit=100)

extract(urls, prompt, schema, **options)ExtractResult

Extract structured data from a URL using an LLM.

result = wp.extract(
    ["https://shop.example.com/product"],
    prompt="Extract the product name, price, and availability",
    schema={
        "type": "object",
        "properties": {
            "name":         {"type": "string"},
            "price":        {"type": "string"},
            "availability": {"type": "string"},
        },
    },
    llm_api_key="sk-...",   # BYOK — falls back to server OPENAI_API_KEY
    model="gpt-4o-mini",
)

print(result.data)        # {"name": "Widget Pro", "price": "$49", ...}
print(result.metadata)    # {"tokensUsed": {...}, "cost": 0.0002, ...}

screenshot(url, **options)bytes

Take a screenshot and get back raw image bytes.

# Returns raw PNG bytes
png = wp.screenshot("https://example.com")
with open("screenshot.png", "wb") as f:
    f.write(png)

# Full-page JPEG
jpg = wp.screenshot(
    "https://example.com",
    full_page=True,
    width=1440,
    height=900,
    format="jpeg",
    quality=85,
    wait_for=1000,   # ms to wait after load
)

research(query, **options)ResearchResult

Research a topic by combining search + scraping.

result = wp.research("best Python web scraping tools 2025", max_sources=5)

print(result.report)   # Markdown report with sourced content

for src in result.sources:
    print(src.title, src.url)
    print(src.content[:500])  # Full scraped content

Async Support

Every method has an async_* counterpart for use with asyncio:

import asyncio
from webpeel import WebPeel

wp = WebPeel(api_key="wp-xxx")

async def main():
    # All async equivalents
    result   = await wp.async_scrape("https://example.com")
    resp     = await wp.async_search("python scraping")
    job      = await wp.async_crawl("https://example.com", max_pages=10)
    status   = await wp.async_crawl_status(job.id)
    batch    = await wp.async_batch(["https://a.com", "https://b.com"])
    map_res  = await wp.async_map("https://example.com")
    extract  = await wp.async_extract(["https://example.com"], prompt="Get title")
    png      = await wp.async_screenshot("https://example.com")
    research = await wp.async_research("web scraping 2025")

asyncio.run(main())

Error Handling

All API errors raise WebPeelError:

from webpeel import WebPeel, WebPeelError

wp = WebPeel(api_key="wp-xxx")

try:
    result = wp.scrape("https://blocked-site.com")
except WebPeelError as e:
    print(e.status_code)   # 403
    print(e.error_code)    # "BLOCKED"
    print(str(e))          # Human-readable message

Running Tests

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

License

MIT — see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

webpeel-0.12.0.tar.gz (14.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

webpeel-0.12.0-py3-none-any.whl (12.4 kB view details)

Uploaded Python 3

File details

Details for the file webpeel-0.12.0.tar.gz.

File metadata

  • Download URL: webpeel-0.12.0.tar.gz
  • Upload date:
  • Size: 14.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for webpeel-0.12.0.tar.gz
Algorithm Hash digest
SHA256 3ea1c54373baa72f0957ce57842978c3cd6a6a61cddb60d6deab067d343a9d55
MD5 8a49c3d012f470f5c5984047d6d4b98b
BLAKE2b-256 7157f8057665c299dc68c322157535246b88456f327051fa1ea3a1ae33a59481

See more details on using hashes here.

File details

Details for the file webpeel-0.12.0-py3-none-any.whl.

File metadata

  • Download URL: webpeel-0.12.0-py3-none-any.whl
  • Upload date:
  • Size: 12.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for webpeel-0.12.0-py3-none-any.whl
Algorithm Hash digest
SHA256 cbb58edef585960aaf1b3b314c1a57f8ac0415e8483b99afde61f271a3bd71d2
MD5 bee3de43c46da0caee1ee84985112783
BLAKE2b-256 f75cbbaf93a8cd60e9015ae33055279043c0bf5c7d75988523789f935aa0bc50

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page