Skip to main content

Fast web fetcher for AI agents — smart extraction, stealth mode, structured data

Project description

WebPeel Python SDK

Fast web fetcher for AI agents — smart extraction, stealth mode, structured data.

Zero dependencies. Pure Python 3.8+ stdlib.

Installation

pip install webpeel

Quick Start

Basic Scraping

from webpeel import WebPeel

client = WebPeel()

# Scrape a URL and get clean markdown
result = client.scrape("https://example.com")
print(result.title)
print(result.content)  # Clean markdown content
print(result.metadata)  # Structured metadata

Search the Web

# Search via DuckDuckGo
results = client.search("python web scraping")

for item in results.data.get("web", []):
    print(f"{item['title']}: {item['url']}")

JavaScript-Heavy Sites

# Use browser rendering for SPAs and JS-heavy sites
result = client.scrape(
    "https://twitter.com/elonmusk",
    render=True,  # Enable browser mode
    wait=2000,    # Wait 2s for JS to load
)

Stealth Mode (Bypass Bot Detection)

# Bypass Cloudflare, reCAPTCHA, and anti-bot systems
result = client.scrape(
    "https://protected-site.com",
    stealth=True,  # Enable stealth mode
)

Structured Data Extraction

# Extract specific data using CSS selectors
result = client.scrape(
    "https://amazon.com/product/...",
    extract={
        "selectors": {
            "title": "h1#title",
            "price": "span.price",
            "rating": ".review-rating",
        }
    }
)

print(result.extracted)
# {"title": "Product Name", "price": "$29.99", "rating": "4.5"}

Crawl a Website

# Start an async crawl job (requires API key)
client = WebPeel(api_key="your-api-key")

job = client.crawl(
    "https://docs.example.com",
    limit=100,
    max_depth=3,
)

print(job.id)  # Job ID for tracking

# Check status later
status = client.get_job(job.id)
print(status["status"])  # pending, running, completed, failed

Map a Domain

# Discover all URLs on a domain
result = client.map("https://example.com")

print(f"Found {result.total} URLs")
for url in result.urls[:10]:
    print(url)

Batch Scraping

# Scrape multiple URLs in batch (requires API key)
client = WebPeel(api_key="your-api-key")

urls = [
    "https://example.com/page1",
    "https://example.com/page2",
    "https://example.com/page3",
]

job = client.batch_scrape(urls, max_tokens=5000)
print(job.id)

API Reference

WebPeel Class

WebPeel(
    api_key: Optional[str] = None,
    base_url: str = "https://api.webpeel.dev",
    timeout: int = 30,
)
  • api_key: API key for authentication (optional for free tier)
  • base_url: Base URL for the WebPeel API
  • timeout: Request timeout in seconds

Methods

scrape(url, **options) -> ScrapeResult

Scrape a URL and extract content.

Options:

  • formats: Output formats (default: ["markdown"])
  • max_tokens: Maximum token count for output
  • render: Use headless browser (default: False)
  • stealth: Bypass bot detection (default: False)
  • wait: Wait time in ms after page load
  • extract: Structured data extraction config
  • headers: Custom HTTP headers

search(query, limit=5) -> SearchResult

Search the web via DuckDuckGo.

crawl(url, limit=50, max_depth=3) -> CrawlResult

Start an async crawl job (requires API key).

map(url) -> MapResult

Discover all URLs on a domain.

batch_scrape(urls, **options) -> BatchResult

Batch scrape multiple URLs (requires API key).

get_job(job_id) -> Dict

Check status of an async job.

WebPeel vs Firecrawl

Feature WebPeel Firecrawl
Pricing $0 local / $9-$29 cloud $16-$333/mo
Free Tier 125 fetches/week 500 credits one-time
License AGPL-3.0 AGPL-3.0
Python SDK Deps Zero (pure stdlib) httpx, pydantic
Smart Escalation ✅ Auto HTTP→Browser→Stealth Manual mode selection
Token Budget --max-tokens
Quality Scoring ✅ 0-1 per response
Local CLI ✅ Free, unlimited Requires API key
LangChain
LlamaIndex

WebPeel is the free, fast, open-source alternative to Firecrawl.

Authentication

Free tier: No API key needed. Anonymous usage with rate limits.

Paid tier: Get an API key at webpeel.dev.

client = WebPeel(api_key="wp_...")

Error Handling

from webpeel import WebPeel, WebPeelError, RateLimitError, TimeoutError

client = WebPeel()

try:
    result = client.scrape("https://example.com")
except RateLimitError:
    print("Rate limit exceeded. Upgrade or wait.")
except TimeoutError:
    print("Request timeout. Try again.")
except WebPeelError as e:
    print(f"Error: {e}")

License

AGPL-3.0 © Jake Liu

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

webpeel-0.13.0.tar.gz (22.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

webpeel-0.13.0-py3-none-any.whl (21.1 kB view details)

Uploaded Python 3

File details

Details for the file webpeel-0.13.0.tar.gz.

File metadata

  • Download URL: webpeel-0.13.0.tar.gz
  • Upload date:
  • Size: 22.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for webpeel-0.13.0.tar.gz
Algorithm Hash digest
SHA256 0a7ecd43cd84b63497b50c65055d7ed88dcb8376c463a499d317427ba00b77f9
MD5 986f4ae6b455962b45951336b892a0d2
BLAKE2b-256 987ac437b88466121dd2fbce27688683bd95641e49e68374cd879784de26822f

See more details on using hashes here.

File details

Details for the file webpeel-0.13.0-py3-none-any.whl.

File metadata

  • Download URL: webpeel-0.13.0-py3-none-any.whl
  • Upload date:
  • Size: 21.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for webpeel-0.13.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6dae9d4760565573a352dc025ef3cd6a466b81613de7e2b551928b29d8a1c172
MD5 398fafcd019f1cb3fe72643697a383b5
BLAKE2b-256 07ae933bd75660d2b797c5847c7c74b51ba2d34fb9ea7e8c942d60da4406d488

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page