Official Python SDK for ClearScrape - Web Scraping API
Project description
ClearScrape Python SDK
Official Python client for the ClearScrape web scraping API.
Features
- Simple, intuitive API
- Full async/await support
- Type hints throughout
- Automatic retries with exponential backoff
- Support for all ClearScrape features:
- JavaScript rendering
- Premium residential proxies
- Antibot bypass
- Screenshots
- Domain-specific extractors (Amazon, Walmart, Google, etc.)
- Scraping Browser (Playwright/Puppeteer)
- Residential Proxy service
Installation
pip install clearscrape
Quick Start
from clearscrape import ClearScrape
client = ClearScrape(api_key="your-api-key")
# Basic scrape
result = client.scrape("https://example.com")
print(result.html)
Usage Examples
Basic Scraping
# Simple HTML fetch
result = client.scrape("https://example.com")
# Get just the HTML
html = client.get_html("https://example.com")
# Get just the text content
text = client.get_text("https://example.com")
JavaScript Rendering
Enable JavaScript rendering for dynamic websites (SPAs, React, Vue, etc.):
result = client.scrape(
"https://example.com/spa-page",
js_render=True,
wait_for=".product-list", # Wait for element
wait=3000 # Additional wait time (ms)
)
Premium Proxies
Use residential proxies to avoid blocks and geo-target:
result = client.scrape(
"https://example.com",
premium_proxy=True,
proxy_country="us" # Target specific country
)
Antibot Bypass
Bypass Cloudflare, DataDome, PerimeterX and other bot protection:
result = client.scrape(
"https://protected-site.com",
antibot=True,
premium_proxy=True
)
Screenshots
Capture screenshots of web pages:
# Get screenshot as bytes
screenshot = client.screenshot("https://example.com")
# Save to file
with open("screenshot.png", "wb") as f:
f.write(screenshot)
# Screenshot specific element
screenshot = client.screenshot(
"https://example.com",
selector=".product-card"
)
Domain Extractors
Extract structured data from supported websites:
# Amazon product data
product = client.extract(
"https://www.amazon.com/dp/B09V3KXJPB",
domain="amazon"
)
print(product["title"]) # "Apple AirPods Pro..."
print(product["price"]) # "$249.00"
print(product["rating"]) # "4.7"
print(product["review_count"]) # "125,432"
# Google SERP data
serp = client.extract(
"https://www.google.com/search?q=best+laptops",
domain="google"
)
print(serp["organic_results"][0]["title"])
print(serp["featured_snippet"])
print(serp["related_searches"])
Supported domains:
amazon- Product pageswalmart- Product pagesgoogle- Search resultsgoogle_shopping- Shopping resultsebay- Product pagestarget- Product pagesetsy- Product pagesbestbuy- Product pageshomedepot- Product pageszillow- Property listingsyelp- Business pagesindeed- Job listingslinkedin_jobs- Job listings
Scraping Browser (Playwright/Puppeteer)
Connect to cloud browsers with built-in antibot bypass:
# With Playwright
from playwright.sync_api import sync_playwright
ws_url = client.get_browser_ws_url()
with sync_playwright() as p:
browser = p.chromium.connect_over_cdp(ws_url)
page = browser.new_page()
page.goto("https://example.com")
title = page.title()
browser.close()
# With country targeting
ws_url = client.get_browser_ws_url(proxy_country="gb")
Residential Proxies
Use ClearScrape proxies with any HTTP client:
# Get proxy configuration
proxy = client.get_proxy_config()
# ProxyConfig(host='proxy.clearscrape.io', port=8000, username='...', password='...')
# Get proxy URL string
proxy_url = client.get_proxy_url()
# 'http://apikey:apikey@proxy.clearscrape.io:8000'
# With country targeting
proxy_url = client.get_proxy_url(country="us")
# With session sticky IP
proxy_url = client.get_proxy_url(session="my-session-123")
# Combined
proxy_url = client.get_proxy_url(country="us", session="abc")
Use with requests:
import requests
proxy = client.get_proxy_config(country="us")
response = requests.get(
"https://httpbin.org/ip",
proxies=proxy.as_dict()
)
Use with httpx:
import httpx
proxy_url = client.get_proxy_url()
response = httpx.get(
"https://httpbin.org/ip",
proxies=proxy_url
)
Async Usage
For async applications, use AsyncClearScrape:
import asyncio
from clearscrape import AsyncClearScrape
async def main():
async with AsyncClearScrape(api_key="your-api-key") as client:
# All methods are async
result = await client.scrape("https://example.com")
print(result.html)
# Scrape multiple URLs concurrently
urls = [
"https://example.com/page1",
"https://example.com/page2",
"https://example.com/page3",
]
results = await asyncio.gather(*[
client.scrape(url) for url in urls
])
asyncio.run(main())
Configuration
client = ClearScrape(
# Required: Your API key
api_key="your-api-key",
# Optional: Custom base URL (default: https://api.clearscrape.io)
base_url="https://api.clearscrape.io",
# Optional: Request timeout in seconds (default: 60)
timeout=60,
# Optional: Number of retries (default: 3)
retries=3
)
Error Handling
from clearscrape import (
ClearScrape,
ClearScrapeError,
InsufficientCreditsError,
RateLimitError,
AuthenticationError,
)
try:
result = client.scrape("https://example.com")
except AuthenticationError:
print("Invalid API key")
except InsufficientCreditsError as e:
print(f"Need {e.required} credits")
except RateLimitError:
print("Rate limited, try again later")
except ClearScrapeError as e:
print(f"Error {e.status_code}: {e.message}")
Credits
| Feature | Cost |
|---|---|
| Base request | 1 credit |
| + JavaScript rendering | +5 credits |
| + Premium proxy | +10 credits |
| + Antibot bypass | +25 credits |
| Domain API extraction | 25 credits |
Support
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file clearscrape-1.0.0.tar.gz.
File metadata
- Download URL: clearscrape-1.0.0.tar.gz
- Upload date:
- Size: 9.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b525af68ad443eda14c46fce0a8896efe8e4dd43676c165baa5b3578cd06c67c
|
|
| MD5 |
3d49555367cd275867b4369c418c6cde
|
|
| BLAKE2b-256 |
3ae4a7068b429d67e479f5599e1b79b15f3151e9878813102407adc0b80bfdcd
|
File details
Details for the file clearscrape-1.0.0-py3-none-any.whl.
File metadata
- Download URL: clearscrape-1.0.0-py3-none-any.whl
- Upload date:
- Size: 10.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
71180767eba24dd90f6e143ecd96a946e545b2f4842c1540d0eda86dd9b56e45
|
|
| MD5 |
e72e915acf066adb4a213c8c640f00b1
|
|
| BLAKE2b-256 |
412c02831d33b764b5435ec153f44c593b84d19d4a65e0402f708dc7ca3a5d16
|