Skip to main content

Official Python SDK for ClearScrape - Web Scraping API

Project description

ClearScrape Python SDK

Official Python client for the ClearScrape web scraping API.

Features

  • Simple, intuitive API
  • Full async/await support
  • Type hints throughout
  • Automatic retries with exponential backoff
  • Support for all ClearScrape features:
    • JavaScript rendering
    • Premium residential proxies
    • Antibot bypass
    • Screenshots
    • Domain-specific extractors (Amazon, Walmart, Google, etc.)
    • Scraping Browser (Playwright/Puppeteer)
    • Residential Proxy service

Installation

pip install clearscrape

Quick Start

1. Create a .env file in your project root:

CLEARSCRAPE_API_KEY=your_api_key_here

Get your API key from the ClearScrape Dashboard. Never hardcode API keys in source code.

2. Use the SDK:

import os
from dotenv import load_dotenv  # pip install python-dotenv
from clearscrape import ClearScrape

load_dotenv()

client = ClearScrape(api_key=os.environ["CLEARSCRAPE_API_KEY"])

# Basic scrape
result = client.scrape("https://example.com")
print(result.html)

Usage Examples

Basic Scraping

# Simple HTML fetch
result = client.scrape("https://example.com")

# Get just the HTML
html = client.get_html("https://example.com")

# Get just the text content
text = client.get_text("https://example.com")

JavaScript Rendering

Enable JavaScript rendering for dynamic websites (SPAs, React, Vue, etc.):

result = client.scrape(
    "https://example.com/spa-page",
    js_render=True,
    wait_for=".product-list",  # Wait for element
    wait=3000                   # Additional wait time (ms)
)

Premium Proxies

Use residential proxies to avoid blocks and geo-target:

result = client.scrape(
    "https://example.com",
    premium_proxy=True,
    proxy_country="us"  # Target specific country
)

Antibot Bypass

Bypass Cloudflare, DataDome, PerimeterX and other bot protection:

result = client.scrape(
    "https://protected-site.com",
    antibot=True,
    premium_proxy=True
)

Screenshots

Capture screenshots of web pages:

# Get screenshot as bytes
screenshot = client.screenshot("https://example.com")

# Save to file
with open("screenshot.png", "wb") as f:
    f.write(screenshot)

# Screenshot specific element
screenshot = client.screenshot(
    "https://example.com",
    selector=".product-card"
)

Domain Extractors

Extract structured data from supported websites:

# Amazon product data
product = client.extract(
    "https://www.amazon.com/dp/B09V3KXJPB",
    domain="amazon"
)

print(product["title"])       # "Apple AirPods Pro..."
print(product["price"])       # "$249.00"
print(product["rating"])      # "4.7"
print(product["review_count"]) # "125,432"

# Google SERP data
serp = client.extract(
    "https://www.google.com/search?q=best+laptops",
    domain="google"
)

print(serp["organic_results"][0]["title"])
print(serp["featured_snippet"])
print(serp["related_searches"])

Supported domains:

  • amazon - Product pages
  • walmart - Product pages
  • google - Search results
  • google_shopping - Shopping results
  • ebay - Product pages
  • target - Product pages
  • etsy - Product pages
  • bestbuy - Product pages
  • homedepot - Product pages
  • zillow - Property listings
  • yelp - Business pages
  • indeed - Job listings
  • linkedin_jobs - Job listings

Scraping Browser (Playwright/Puppeteer)

Connect to cloud browsers with built-in antibot bypass:

# With Playwright
from playwright.sync_api import sync_playwright

ws_url = client.get_browser_ws_url()

with sync_playwright() as p:
    browser = p.chromium.connect_over_cdp(ws_url)
    page = browser.new_page()
    page.goto("https://example.com")

    title = page.title()
    browser.close()
# With country targeting
ws_url = client.get_browser_ws_url(proxy_country="gb")

Residential Proxies

Use ClearScrape proxies with any HTTP client:

# Get proxy configuration
proxy = client.get_proxy_config()
# ProxyConfig(host='proxy.clearscrape.io', port=8000, username='...', password='...')

# Get proxy URL string
proxy_url = client.get_proxy_url()
# 'http://apikey:apikey@proxy.clearscrape.io:8000'

# With country targeting
proxy_url = client.get_proxy_url(country="us")

# With session sticky IP
proxy_url = client.get_proxy_url(session="my-session-123")

# Combined
proxy_url = client.get_proxy_url(country="us", session="abc")

Use with requests:

import requests

proxy = client.get_proxy_config(country="us")
response = requests.get(
    "https://httpbin.org/ip",
    proxies=proxy.as_dict()
)

Use with httpx:

import httpx

proxy_url = client.get_proxy_url()
response = httpx.get(
    "https://httpbin.org/ip",
    proxies=proxy_url
)

Async Usage

For async applications, use AsyncClearScrape:

import asyncio
from clearscrape import AsyncClearScrape

async def main():
    async with AsyncClearScrape(api_key="your-api-key") as client:
        # All methods are async
        result = await client.scrape("https://example.com")
        print(result.html)

        # Scrape multiple URLs concurrently
        urls = [
            "https://example.com/page1",
            "https://example.com/page2",
            "https://example.com/page3",
        ]
        results = await asyncio.gather(*[
            client.scrape(url) for url in urls
        ])

asyncio.run(main())

Configuration

client = ClearScrape(
    # Required: Your API key
    api_key="your-api-key",

    # Optional: Custom base URL (default: https://clearscrape.io/api)
    base_url="https://clearscrape.io/api",

    # Optional: Request timeout in seconds (default: 60)
    timeout=60,

    # Optional: Number of retries (default: 3)
    retries=3
)

Error Handling

from clearscrape import (
    ClearScrape,
    ClearScrapeError,
    InsufficientCreditsError,
    RateLimitError,
    AuthenticationError,
)

try:
    result = client.scrape("https://example.com")
except AuthenticationError:
    print("Invalid API key")
except InsufficientCreditsError as e:
    print(f"Need {e.required} credits")
except RateLimitError:
    print("Rate limited, try again later")
except ClearScrapeError as e:
    print(f"Error {e.status_code}: {e.message}")

Credits

Feature Cost
Base request 1 credit
+ JavaScript rendering +5 credits
+ Premium proxy +10 credits
+ Antibot bypass +25 credits
Domain API extraction 25 credits

Support

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clearscrape-1.2.0.tar.gz (10.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

clearscrape-1.2.0-py3-none-any.whl (11.7 kB view details)

Uploaded Python 3

File details

Details for the file clearscrape-1.2.0.tar.gz.

File metadata

  • Download URL: clearscrape-1.2.0.tar.gz
  • Upload date:
  • Size: 10.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.3

File hashes

Hashes for clearscrape-1.2.0.tar.gz
Algorithm Hash digest
SHA256 2ed82f8e0d684a23d417806b04d847d210b0bdbb0a435b6b9fef2faacdf37ce2
MD5 19b931333de4bb0a8a3ead2666a4bdec
BLAKE2b-256 29f2ab1362e301d123b739b209f5375b712a50a7e5e8445d513390bf2893e544

See more details on using hashes here.

File details

Details for the file clearscrape-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: clearscrape-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 11.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.3

File hashes

Hashes for clearscrape-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 730c7b40f77eaa3d1d88e33db06b436c862693289af905df8b5763747a7166c2
MD5 f65dc7ddd36137f0df6080739ea589e4
BLAKE2b-256 8e6c50a39c5b8c6435828382120a9f524a37e4cf2311554ef4e38232bdc286e7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page