Skip to main content

scrapeless python sdk

Project description

Scrapeless Python SDK

The official Python SDK for Scrapeless AI - End-to-End Data Infrastructure for AI Developers & Enterprises.

📑 Table of Contents

🌟 Features

  • Browser: Advanced browser session management supporting Playwright and pyppeteer frameworks, with configurable anti-detection capabilities (e.g., fingerprint spoofing, CAPTCHA solving) and extensible automation workflows.
  • Universal Scraping API: web interaction and data extraction with full browser capabilities. Execute JavaScript rendering, simulate user interactions (clicks, scrolls), bypass anti-scraping measures, and export structured data in formats.
  • Crawl: Extract data from single pages or traverse entire domains, exporting in formats including Markdown, JSON, HTML, screenshots, and links.
  • Scraping API: Direct data extraction APIs for websites (e.g., e-commerce, travel platforms). Retrieve structured product information, pricing, and reviews with pre-built connectors.
  • Deep SerpApi: Google SERP data extraction API. Fetch organic results, news, images, and more with customizable parameters and real-time updates.
  • Proxies: Geo-targeted proxy network with 195+ countries. Optimize requests for better success rates and regional data access.
  • Actor: Deploy custom crawling and data processing workflows at scale with built-in scheduling and resource management.
  • Storage Solutions: Scalable data storage solutions for crawled content, supporting seamless integration with cloud services and databases.

📦 Installation

Install the SDK using pip:

pip install scrapeless

🚀 Quick Start

Prerequisite

Log in to the Scrapeless Dashboard and get the API Key

Basic Setup

from scrapeless import Scrapeless

client = Scrapeless({
    'api_key': 'your-api-key'  # Get your API key from https://scrapeless.com
})

Environment Variables

You can also configure the SDK using environment variables:

# Required
SCRAPELESS_API_KEY=your-api-key

# Optional - Custom API endpoints
SCRAPELESS_BASE_API_URL=https://api.scrapeless.com
SCRAPELESS_ACTOR_API_URL=https://actor.scrapeless.com
SCRAPELESS_STORAGE_API_URL=https://storage.scrapeless.com
SCRAPELESS_BROWSER_API_URL=https://browser.scrapeless.com
SCRAPELESS_CRAWL_API_URL=https://api.scrapeless.com

📖 Usage Examples

Browser

Advanced browser session management supporting Playwright and Pyppeteer frameworks, with configurable anti-detection capabilities (e.g., fingerprint spoofing, CAPTCHA solving) and extensible automation workflows:

from scrapeless import Scrapeless
from scrapeless.types import ICreateBrowser
import pyppeteer

client = Scrapeless()


async def example():
    # Create a browser session
    config = ICreateBrowser(
        session_name='sdk_test',
        session_ttl=180,
        proxy_country='US',
        session_recording=True
    )
    session = client.browser.create(config).__dict__
    browser_ws_endpoint = session['browser_ws_endpoint']
    print('Browser WebSocket endpoint created:', browser_ws_endpoint)

    # Connect to browser using pyppeteer
    browser = await pyppeteer.connect({'browserWSEndpoint': browser_ws_endpoint})
    # Open new page and navigate to website
    page = await browser.newPage()
    await page.goto('https://www.scrapeless.com')

Crawl

Extract data from single pages or traverse entire domains, exporting in formats including Markdown, JSON, HTML, screenshots, and links.

from scrapeless import Scrapeless

client = Scrapeless()

result = client.scraping_crawl.scrape_url("https://example.com")
print(result)

Scraping API

Direct data extraction APIs for websites (e.g., e-commerce, travel platforms). Retrieve structured product information, pricing, and reviews with pre-built connectors:

from scrapeless import Scrapeless
from scrapeless.types import ScrapingTaskRequest

client = Scrapeless()
request = ScrapingTaskRequest(
    actor='scraper.google.search',
    input={'q': 'nike site:www.nike.com'}
)
result = client.scraping.scrape(request=request)
print(result)

Deep SerpApi

Google SERP data extraction API. Fetch organic results, news, images, and more with customizable parameters and real-time updates:

from scrapeless import Scrapeless
from scrapeless.types import ScrapingTaskRequest

client = Scrapeless()
request = ScrapingTaskRequest(
    actor='scraper.google.search',
    input={'q': 'nike site:www.nike.com'}
)
result = client.deepserp.scrape(request=request)
print(result)

Actor

Deploy custom crawling and data processing workflows at scale with built-in scheduling and resource management:

from scrapeless import Scrapeless
from scrapeless.types import IRunActorData, IActorRunOptions

client = Scrapeless()
data = IRunActorData(
    input={'url': 'https://example.com'},
    run_options=IActorRunOptions(
        CPU=2,
        memory=2048,
        timeout=600,
    )
)

run = client.actor.run(
    actor_id='your_actor_id',
    data=data
)
print('Actor run result:', run)

Error Handling

The SDK throws ScrapelessError for API-related errors:

from scrapeless import Scrapeless, ScrapelessError

client = Scrapeless()
try:
    result = client.scraping.scrape({'url': 'invalid-url'})
except ScrapelessError as error:
    print(f"Scrapeless API error: {error}")
    if hasattr(error, 'status_code'):
        print(f"Status code: {error.status_code}")

🔧 API Reference

Client Configuration

from scrapeless.types import ScrapelessConfig 

config = ScrapelessConfig(
    api_key='', # Your api key
    timeout=30000, # Request timeout in milliseconds (default: 30000)
    base_api_url='', # Base API URL
    actor_api_url='', # Actor service URL
    storage_api_url='', # Storage service URL
    browser_api_url='', # Browser service URL
    scraping_crawl_api_url='' # Crawl service URL
)

Available Services

The SDK provides the following services through the main client:

  • client.browser - browser automation with Playwright/Pyppeteer support, anti-detection tools (fingerprinting, CAPTCHA solving), and extensible workflows.
  • client.universal - JS rendering, user simulation (clicks/scrolls), anti-block bypass, and structured data export.
  • client.scraping_crawl - Recursive site crawling with multi-format export (Markdown, JSON, HTML, screenshots, links).
  • client.scraping - Pre-built connectors for sites (e.g., e-commerce, travel) to extract product data, pricing, and reviews.
  • client.deepserp - Search engine results extraction
  • client.proxies - Proxy management
  • client.actor - Scalable workflow automation with built-in scheduling and resource management.
  • client.storage - Data storage solutions

📚 Examples

Check out the examples directory for comprehensive usage examples:

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

📞 Support

🏢 About Scrapeless

Scrapeless is a powerful web scraping and browser automation platform that helps businesses extract data from any website at scale. Our platform provides:

  • High-performance web scraping infrastructure
  • Global proxy network
  • Browser automation capabilities
  • Enterprise-grade reliability and support

Visit scrapeless.com to learn more and get started.


Made with ❤️ by the Scrapeless team

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapeless-1.2.1.tar.gz (26.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scrapeless-1.2.1-py3-none-any.whl (39.9 kB view details)

Uploaded Python 3

File details

Details for the file scrapeless-1.2.1.tar.gz.

File metadata

  • Download URL: scrapeless-1.2.1.tar.gz
  • Upload date:
  • Size: 26.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for scrapeless-1.2.1.tar.gz
Algorithm Hash digest
SHA256 152aa452a157ef4d5dcdfe610cf10f8e76f8c0804823a4a731933aff36b0984b
MD5 8cfe87f616e1917d996a2ad65d9e13cd
BLAKE2b-256 64fe8a1f2c10bd65c287d27a7c1d2dc43598ce8ccbc53f868c7d2e22b1524bcd

See more details on using hashes here.

Provenance

The following attestation bundles were made for scrapeless-1.2.1.tar.gz:

Publisher: publish.yml on scrapeless-ai/sdk-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file scrapeless-1.2.1-py3-none-any.whl.

File metadata

  • Download URL: scrapeless-1.2.1-py3-none-any.whl
  • Upload date:
  • Size: 39.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for scrapeless-1.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 67d8f33ceee4c59f3d5725466152ff52718f88b6aa27d5df1c3f8f41063b8ca6
MD5 ee9f516b8a33827a2b24c8edc58080b7
BLAKE2b-256 005fe512f9b06759ce1c130a7b8d6a7db87ad4906f813c6138484815ea23bc50

See more details on using hashes here.

Provenance

The following attestation bundles were made for scrapeless-1.2.1-py3-none-any.whl:

Publisher: publish.yml on scrapeless-ai/sdk-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page