Skip to main content

Official Python SDK for ClearScrape - Web Scraping API

Project description

ClearScrape Python SDK

Official Python client for the ClearScrape web scraping API.

Features

  • Simple, intuitive API
  • Full async/await support
  • Type hints throughout
  • Automatic retries with exponential backoff
  • Support for all ClearScrape features:
    • JavaScript rendering
    • Premium residential proxies
    • Antibot bypass
    • Screenshots
    • Domain-specific extractors (Amazon, Walmart, Google, etc.)
    • Scraping Browser (Playwright/Puppeteer)
    • Residential Proxy service

Installation

pip install clearscrape

Quick Start

from clearscrape import ClearScrape

client = ClearScrape(api_key="your-api-key")

# Basic scrape
result = client.scrape("https://example.com")
print(result.html)

Usage Examples

Basic Scraping

# Simple HTML fetch
result = client.scrape("https://example.com")

# Get just the HTML
html = client.get_html("https://example.com")

# Get just the text content
text = client.get_text("https://example.com")

JavaScript Rendering

Enable JavaScript rendering for dynamic websites (SPAs, React, Vue, etc.):

result = client.scrape(
    "https://example.com/spa-page",
    js_render=True,
    wait_for=".product-list",  # Wait for element
    wait=3000                   # Additional wait time (ms)
)

Premium Proxies

Use residential proxies to avoid blocks and geo-target:

result = client.scrape(
    "https://example.com",
    premium_proxy=True,
    proxy_country="us"  # Target specific country
)

Antibot Bypass

Bypass Cloudflare, DataDome, PerimeterX and other bot protection:

result = client.scrape(
    "https://protected-site.com",
    antibot=True,
    premium_proxy=True
)

Screenshots

Capture screenshots of web pages:

# Get screenshot as bytes
screenshot = client.screenshot("https://example.com")

# Save to file
with open("screenshot.png", "wb") as f:
    f.write(screenshot)

# Screenshot specific element
screenshot = client.screenshot(
    "https://example.com",
    selector=".product-card"
)

Domain Extractors

Extract structured data from supported websites:

# Amazon product data
product = client.extract(
    "https://www.amazon.com/dp/B09V3KXJPB",
    domain="amazon"
)

print(product["title"])       # "Apple AirPods Pro..."
print(product["price"])       # "$249.00"
print(product["rating"])      # "4.7"
print(product["review_count"]) # "125,432"

# Google SERP data
serp = client.extract(
    "https://www.google.com/search?q=best+laptops",
    domain="google"
)

print(serp["organic_results"][0]["title"])
print(serp["featured_snippet"])
print(serp["related_searches"])

Supported domains:

  • amazon - Product pages
  • walmart - Product pages
  • google - Search results
  • google_shopping - Shopping results
  • ebay - Product pages
  • target - Product pages
  • etsy - Product pages
  • bestbuy - Product pages
  • homedepot - Product pages
  • zillow - Property listings
  • yelp - Business pages
  • indeed - Job listings
  • linkedin_jobs - Job listings

Scraping Browser (Playwright/Puppeteer)

Connect to cloud browsers with built-in antibot bypass:

# With Playwright
from playwright.sync_api import sync_playwright

ws_url = client.get_browser_ws_url()

with sync_playwright() as p:
    browser = p.chromium.connect_over_cdp(ws_url)
    page = browser.new_page()
    page.goto("https://example.com")

    title = page.title()
    browser.close()
# With country targeting
ws_url = client.get_browser_ws_url(proxy_country="gb")

Residential Proxies

Use ClearScrape proxies with any HTTP client:

# Get proxy configuration
proxy = client.get_proxy_config()
# ProxyConfig(host='proxy.clearscrape.io', port=8000, username='...', password='...')

# Get proxy URL string
proxy_url = client.get_proxy_url()
# 'http://apikey:apikey@proxy.clearscrape.io:8000'

# With country targeting
proxy_url = client.get_proxy_url(country="us")

# With session sticky IP
proxy_url = client.get_proxy_url(session="my-session-123")

# Combined
proxy_url = client.get_proxy_url(country="us", session="abc")

Use with requests:

import requests

proxy = client.get_proxy_config(country="us")
response = requests.get(
    "https://httpbin.org/ip",
    proxies=proxy.as_dict()
)

Use with httpx:

import httpx

proxy_url = client.get_proxy_url()
response = httpx.get(
    "https://httpbin.org/ip",
    proxies=proxy_url
)

Async Usage

For async applications, use AsyncClearScrape:

import asyncio
from clearscrape import AsyncClearScrape

async def main():
    async with AsyncClearScrape(api_key="your-api-key") as client:
        # All methods are async
        result = await client.scrape("https://example.com")
        print(result.html)

        # Scrape multiple URLs concurrently
        urls = [
            "https://example.com/page1",
            "https://example.com/page2",
            "https://example.com/page3",
        ]
        results = await asyncio.gather(*[
            client.scrape(url) for url in urls
        ])

asyncio.run(main())

Configuration

client = ClearScrape(
    # Required: Your API key
    api_key="your-api-key",

    # Optional: Custom base URL (default: https://clearscrape.io/api)
    base_url="https://clearscrape.io/api",

    # Optional: Request timeout in seconds (default: 60)
    timeout=60,

    # Optional: Number of retries (default: 3)
    retries=3
)

Error Handling

from clearscrape import (
    ClearScrape,
    ClearScrapeError,
    InsufficientCreditsError,
    RateLimitError,
    AuthenticationError,
)

try:
    result = client.scrape("https://example.com")
except AuthenticationError:
    print("Invalid API key")
except InsufficientCreditsError as e:
    print(f"Need {e.required} credits")
except RateLimitError:
    print("Rate limited, try again later")
except ClearScrapeError as e:
    print(f"Error {e.status_code}: {e.message}")

Credits

Feature Cost
Base request 1 credit
+ JavaScript rendering +5 credits
+ Premium proxy +10 credits
+ Antibot bypass +25 credits
Domain API extraction 25 credits

Support

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clearscrape-1.1.0.tar.gz (10.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

clearscrape-1.1.0-py3-none-any.whl (11.4 kB view details)

Uploaded Python 3

File details

Details for the file clearscrape-1.1.0.tar.gz.

File metadata

  • Download URL: clearscrape-1.1.0.tar.gz
  • Upload date:
  • Size: 10.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for clearscrape-1.1.0.tar.gz
Algorithm Hash digest
SHA256 5f2efb3b2467fb5715906fdc461f7b141676c8c71e9736c8b5910d1345511e85
MD5 5df1d822ecf556fd41c58e274363bf04
BLAKE2b-256 941a026a7a2a8081fa3486d6d175a50e8ee960aa1fb1509338f2de42738e79b6

See more details on using hashes here.

File details

Details for the file clearscrape-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: clearscrape-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 11.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for clearscrape-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 436b63d3ba1b350d9e903db93f322dab481a210b06071e7fbb8e8c15ba24c93b
MD5 a7623e47559107fd2042481b73a8e44f
BLAKE2b-256 b7f0dd71303b85ee3c25b2e2d26b74a3f0a4ccfdabc1aa30a1285b987b0ff71c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page