Fast web fetcher for AI agents — smart extraction, stealth mode, structured data
Project description
WebPeel Python SDK
Fast web fetcher for AI agents — smart extraction, stealth mode, structured data.
Zero dependencies. Pure Python 3.8+ stdlib.
Installation
pip install webpeel
Quick Start
Basic Scraping
from webpeel import WebPeel
client = WebPeel()
# Scrape a URL and get clean markdown
result = client.scrape("https://example.com")
print(result.title)
print(result.content) # Clean markdown content
print(result.metadata) # Structured metadata
Search the Web
# Search via DuckDuckGo
results = client.search("python web scraping")
for item in results.data.get("web", []):
print(f"{item['title']}: {item['url']}")
JavaScript-Heavy Sites
# Use browser rendering for SPAs and JS-heavy sites
result = client.scrape(
"https://twitter.com/elonmusk",
render=True, # Enable browser mode
wait=2000, # Wait 2s for JS to load
)
Stealth Mode (Bypass Bot Detection)
# Bypass Cloudflare, reCAPTCHA, and anti-bot systems
result = client.scrape(
"https://protected-site.com",
stealth=True, # Enable stealth mode
)
Structured Data Extraction
# Extract specific data using CSS selectors
result = client.scrape(
"https://amazon.com/product/...",
extract={
"selectors": {
"title": "h1#title",
"price": "span.price",
"rating": ".review-rating",
}
}
)
print(result.extracted)
# {"title": "Product Name", "price": "$29.99", "rating": "4.5"}
Crawl a Website
# Start an async crawl job (requires API key)
client = WebPeel(api_key="your-api-key")
job = client.crawl(
"https://docs.example.com",
limit=100,
max_depth=3,
)
print(job.id) # Job ID for tracking
# Check status later
status = client.get_job(job.id)
print(status["status"]) # pending, running, completed, failed
Map a Domain
# Discover all URLs on a domain
result = client.map("https://example.com")
print(f"Found {result.total} URLs")
for url in result.urls[:10]:
print(url)
Batch Scraping
# Scrape multiple URLs in batch (requires API key)
client = WebPeel(api_key="your-api-key")
urls = [
"https://example.com/page1",
"https://example.com/page2",
"https://example.com/page3",
]
job = client.batch_scrape(urls, max_tokens=5000)
print(job.id)
API Reference
WebPeel Class
WebPeel(
api_key: Optional[str] = None,
base_url: str = "https://api.webpeel.dev",
timeout: int = 30,
)
api_key: API key for authentication (optional for free tier)base_url: Base URL for the WebPeel APItimeout: Request timeout in seconds
Methods
scrape(url, **options) -> ScrapeResult
Scrape a URL and extract content.
Options:
formats: Output formats (default:["markdown"])max_tokens: Maximum token count for outputrender: Use headless browser (default:False)stealth: Bypass bot detection (default:False)wait: Wait time in ms after page loadextract: Structured data extraction configheaders: Custom HTTP headers
search(query, limit=5) -> SearchResult
Search the web via DuckDuckGo.
crawl(url, limit=50, max_depth=3) -> CrawlResult
Start an async crawl job (requires API key).
map(url) -> MapResult
Discover all URLs on a domain.
batch_scrape(urls, **options) -> BatchResult
Batch scrape multiple URLs (requires API key).
get_job(job_id) -> Dict
Check status of an async job.
WebPeel vs Firecrawl
| Feature | WebPeel | Firecrawl |
|---|---|---|
| Pricing | $0 local / $9-$29 cloud | $16-$333/mo |
| Free Tier | 125 fetches/week | 500 credits one-time |
| License | AGPL-3.0 | AGPL-3.0 |
| Python SDK Deps | Zero (pure stdlib) | httpx, pydantic |
| Smart Escalation | ✅ Auto HTTP→Browser→Stealth | Manual mode selection |
| Token Budget | ✅ --max-tokens |
❌ |
| Quality Scoring | ✅ 0-1 per response | ❌ |
| Local CLI | ✅ Free, unlimited | Requires API key |
| LangChain | ✅ | ✅ |
| LlamaIndex | ✅ | ✅ |
WebPeel is the free, fast, open-source alternative to Firecrawl.
Authentication
Free tier: No API key needed. Anonymous usage with rate limits.
Paid tier: Get an API key at webpeel.dev.
client = WebPeel(api_key="wp_...")
Error Handling
from webpeel import WebPeel, WebPeelError, RateLimitError, TimeoutError
client = WebPeel()
try:
result = client.scrape("https://example.com")
except RateLimitError:
print("Rate limit exceeded. Upgrade or wait.")
except TimeoutError:
print("Request timeout. Try again.")
except WebPeelError as e:
print(f"Error: {e}")
License
AGPL-3.0 © Jake Liu
Links
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file webpeel-0.13.0.tar.gz.
File metadata
- Download URL: webpeel-0.13.0.tar.gz
- Upload date:
- Size: 22.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0a7ecd43cd84b63497b50c65055d7ed88dcb8376c463a499d317427ba00b77f9
|
|
| MD5 |
986f4ae6b455962b45951336b892a0d2
|
|
| BLAKE2b-256 |
987ac437b88466121dd2fbce27688683bd95641e49e68374cd879784de26822f
|
File details
Details for the file webpeel-0.13.0-py3-none-any.whl.
File metadata
- Download URL: webpeel-0.13.0-py3-none-any.whl
- Upload date:
- Size: 21.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6dae9d4760565573a352dc025ef3cd6a466b81613de7e2b551928b29d8a1c172
|
|
| MD5 |
398fafcd019f1cb3fe72643697a383b5
|
|
| BLAKE2b-256 |
07ae933bd75660d2b797c5847c7c74b51ba2d34fb9ea7e8c942d60da4406d488
|