Skip to main content

Official Python SDK for ClearScrape - Web Scraping API

Project description

ClearScrape Python SDK

Official Python client for the ClearScrape web scraping API.

Features

  • Simple, intuitive API
  • Full async/await support
  • Type hints throughout
  • Automatic retries with exponential backoff
  • Support for all ClearScrape features:
    • JavaScript rendering
    • Premium residential proxies
    • Antibot bypass
    • Screenshots
    • Domain-specific extractors (Amazon, Walmart, Google, etc.)
    • Scraping Browser (Playwright/Puppeteer)
    • Residential Proxy service

Installation

pip install clearscrape

Quick Start

from clearscrape import ClearScrape

client = ClearScrape(api_key="your-api-key")

# Basic scrape
result = client.scrape("https://example.com")
print(result.html)

Usage Examples

Basic Scraping

# Simple HTML fetch
result = client.scrape("https://example.com")

# Get just the HTML
html = client.get_html("https://example.com")

# Get just the text content
text = client.get_text("https://example.com")

JavaScript Rendering

Enable JavaScript rendering for dynamic websites (SPAs, React, Vue, etc.):

result = client.scrape(
    "https://example.com/spa-page",
    js_render=True,
    wait_for=".product-list",  # Wait for element
    wait=3000                   # Additional wait time (ms)
)

Premium Proxies

Use residential proxies to avoid blocks and geo-target:

result = client.scrape(
    "https://example.com",
    premium_proxy=True,
    proxy_country="us"  # Target specific country
)

Antibot Bypass

Bypass Cloudflare, DataDome, PerimeterX and other bot protection:

result = client.scrape(
    "https://protected-site.com",
    antibot=True,
    premium_proxy=True
)

Screenshots

Capture screenshots of web pages:

# Get screenshot as bytes
screenshot = client.screenshot("https://example.com")

# Save to file
with open("screenshot.png", "wb") as f:
    f.write(screenshot)

# Screenshot specific element
screenshot = client.screenshot(
    "https://example.com",
    selector=".product-card"
)

Domain Extractors

Extract structured data from supported websites:

# Amazon product data
product = client.extract(
    "https://www.amazon.com/dp/B09V3KXJPB",
    domain="amazon"
)

print(product["title"])       # "Apple AirPods Pro..."
print(product["price"])       # "$249.00"
print(product["rating"])      # "4.7"
print(product["review_count"]) # "125,432"

# Google SERP data
serp = client.extract(
    "https://www.google.com/search?q=best+laptops",
    domain="google"
)

print(serp["organic_results"][0]["title"])
print(serp["featured_snippet"])
print(serp["related_searches"])

Supported domains:

  • amazon - Product pages
  • walmart - Product pages
  • google - Search results
  • google_shopping - Shopping results
  • ebay - Product pages
  • target - Product pages
  • etsy - Product pages
  • bestbuy - Product pages
  • homedepot - Product pages
  • zillow - Property listings
  • yelp - Business pages
  • indeed - Job listings
  • linkedin_jobs - Job listings

Scraping Browser (Playwright/Puppeteer)

Connect to cloud browsers with built-in antibot bypass:

# With Playwright
from playwright.sync_api import sync_playwright

ws_url = client.get_browser_ws_url()

with sync_playwright() as p:
    browser = p.chromium.connect_over_cdp(ws_url)
    page = browser.new_page()
    page.goto("https://example.com")

    title = page.title()
    browser.close()
# With country targeting
ws_url = client.get_browser_ws_url(proxy_country="gb")

Residential Proxies

Use ClearScrape proxies with any HTTP client:

# Get proxy configuration
proxy = client.get_proxy_config()
# ProxyConfig(host='proxy.clearscrape.io', port=8000, username='...', password='...')

# Get proxy URL string
proxy_url = client.get_proxy_url()
# 'http://apikey:apikey@proxy.clearscrape.io:8000'

# With country targeting
proxy_url = client.get_proxy_url(country="us")

# With session sticky IP
proxy_url = client.get_proxy_url(session="my-session-123")

# Combined
proxy_url = client.get_proxy_url(country="us", session="abc")

Use with requests:

import requests

proxy = client.get_proxy_config(country="us")
response = requests.get(
    "https://httpbin.org/ip",
    proxies=proxy.as_dict()
)

Use with httpx:

import httpx

proxy_url = client.get_proxy_url()
response = httpx.get(
    "https://httpbin.org/ip",
    proxies=proxy_url
)

Async Usage

For async applications, use AsyncClearScrape:

import asyncio
from clearscrape import AsyncClearScrape

async def main():
    async with AsyncClearScrape(api_key="your-api-key") as client:
        # All methods are async
        result = await client.scrape("https://example.com")
        print(result.html)

        # Scrape multiple URLs concurrently
        urls = [
            "https://example.com/page1",
            "https://example.com/page2",
            "https://example.com/page3",
        ]
        results = await asyncio.gather(*[
            client.scrape(url) for url in urls
        ])

asyncio.run(main())

Configuration

client = ClearScrape(
    # Required: Your API key
    api_key="your-api-key",

    # Optional: Custom base URL (default: https://api.clearscrape.io)
    base_url="https://api.clearscrape.io",

    # Optional: Request timeout in seconds (default: 60)
    timeout=60,

    # Optional: Number of retries (default: 3)
    retries=3
)

Error Handling

from clearscrape import (
    ClearScrape,
    ClearScrapeError,
    InsufficientCreditsError,
    RateLimitError,
    AuthenticationError,
)

try:
    result = client.scrape("https://example.com")
except AuthenticationError:
    print("Invalid API key")
except InsufficientCreditsError as e:
    print(f"Need {e.required} credits")
except RateLimitError:
    print("Rate limited, try again later")
except ClearScrapeError as e:
    print(f"Error {e.status_code}: {e.message}")

Credits

Feature Cost
Base request 1 credit
+ JavaScript rendering +5 credits
+ Premium proxy +10 credits
+ Antibot bypass +25 credits
Domain API extraction 25 credits

Support

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clearscrape-1.0.0.tar.gz (9.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

clearscrape-1.0.0-py3-none-any.whl (10.8 kB view details)

Uploaded Python 3

File details

Details for the file clearscrape-1.0.0.tar.gz.

File metadata

  • Download URL: clearscrape-1.0.0.tar.gz
  • Upload date:
  • Size: 9.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for clearscrape-1.0.0.tar.gz
Algorithm Hash digest
SHA256 b525af68ad443eda14c46fce0a8896efe8e4dd43676c165baa5b3578cd06c67c
MD5 3d49555367cd275867b4369c418c6cde
BLAKE2b-256 3ae4a7068b429d67e479f5599e1b79b15f3151e9878813102407adc0b80bfdcd

See more details on using hashes here.

File details

Details for the file clearscrape-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: clearscrape-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 10.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for clearscrape-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 71180767eba24dd90f6e143ecd96a946e545b2f4842c1540d0eda86dd9b56e45
MD5 e72e915acf066adb4a213c8c640f00b1
BLAKE2b-256 412c02831d33b764b5435ec153f44c593b84d19d4a65e0402f708dc7ca3a5d16

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page