scrapeless python sdk

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

scrapeless_team

These details have not been verified by PyPI

Project description

Scrapeless Python SDK

The official Python SDK for Scrapeless AI - End-to-End Data Infrastructure for AI Developers & Enterprises.

🌟 Features

Browser: Advanced browser session management supporting Playwright and pyppeteer frameworks, with configurable anti-detection capabilities (e.g., fingerprint spoofing, CAPTCHA solving) and extensible automation workflows.
Universal Scraping API: web interaction and data extraction with full browser capabilities. Execute JavaScript rendering, simulate user interactions (clicks, scrolls), bypass anti-scraping measures, and export structured data in formats.
Crawl: Extract data from single pages or traverse entire domains, exporting in formats including Markdown, JSON, HTML, screenshots, and links.
Scraping API: Direct data extraction APIs for websites (e.g., e-commerce, travel platforms). Retrieve structured product information, pricing, and reviews with pre-built connectors.
Deep SerpApi: Google SERP data extraction API. Fetch organic results, news, images, and more with customizable parameters and real-time updates.
Proxies: Geo-targeted proxy network with 195+ countries. Optimize requests for better success rates and regional data access.
Actor: Deploy custom crawling and data processing workflows at scale with built-in scheduling and resource management.
Storage Solutions: Scalable data storage solutions for crawled content, supporting seamless integration with cloud services and databases.

📦 Installation

Install the SDK using pip:

pip install scrapeless

🚀 Quick Start

Prerequisite

Basic Setup

from scrapeless import Scrapeless

client = Scrapeless({
    'api_key': 'your-api-key'  # Get your API key from https://scrapeless.com
})

Environment Variables

You can also configure the SDK using environment variables:

# Required
SCRAPELESS_API_KEY=your-api-key

# Optional - Custom API endpoints
SCRAPELESS_BASE_API_URL=https://api.scrapeless.com
SCRAPELESS_ACTOR_API_URL=https://actor.scrapeless.com
SCRAPELESS_STORAGE_API_URL=https://storage.scrapeless.com
SCRAPELESS_BROWSER_API_URL=https://browser.scrapeless.com
SCRAPELESS_CRAWL_API_URL=https://api.scrapeless.com

📖 Usage Examples

Browser

Advanced browser session management supporting Playwright and Pyppeteer frameworks, with configurable anti-detection capabilities (e.g., fingerprint spoofing, CAPTCHA solving) and extensible automation workflows:

from scrapeless import Scrapeless
from scrapeless.types import ICreateBrowser
import pyppeteer

client = Scrapeless()


async def example():
    # Create a browser session
    config = ICreateBrowser(
        session_name='sdk_test',
        session_ttl=180,
        proxy_country='US',
        session_recording=True
    )
    session = client.browser.create(config).__dict__
    browser_ws_endpoint = session['browser_ws_endpoint']
    print('Browser WebSocket endpoint created:', browser_ws_endpoint)

    # Connect to browser using pyppeteer
    browser = await pyppeteer.connect({'browserWSEndpoint': browser_ws_endpoint})
    # Open new page and navigate to website
    page = await browser.newPage()
    await page.goto('https://www.scrapeless.com')

Crawl

Extract data from single pages or traverse entire domains, exporting in formats including Markdown, JSON, HTML, screenshots, and links.

from scrapeless import Scrapeless

client = Scrapeless()

result = client.scraping_crawl.scrape_url("https://example.com")
print(result)

Scraping API

Direct data extraction APIs for websites (e.g., e-commerce, travel platforms). Retrieve structured product information, pricing, and reviews with pre-built connectors:

from scrapeless import Scrapeless
from scrapeless.types import ScrapingTaskRequest

client = Scrapeless()
request = ScrapingTaskRequest(
    actor='scraper.google.search',
    input={'q': 'nike site:www.nike.com'}
)
result = client.scraping.scrape(request=request)
print(result)

Deep SerpApi

Google SERP data extraction API. Fetch organic results, news, images, and more with customizable parameters and real-time updates:

from scrapeless import Scrapeless
from scrapeless.types import ScrapingTaskRequest

client = Scrapeless()
request = ScrapingTaskRequest(
    actor='scraper.google.search',
    input={'q': 'nike site:www.nike.com'}
)
result = client.deepserp.scrape(request=request)
print(result)

Actor

Deploy custom crawling and data processing workflows at scale with built-in scheduling and resource management:

from scrapeless import Scrapeless
from scrapeless.types import IRunActorData, IActorRunOptions

client = Scrapeless()
data = IRunActorData(
    input={'url': 'https://example.com'},
    run_options=IActorRunOptions(
        CPU=2,
        memory=2048,
        timeout=600,
    )
)

run = client.actor.run(
    actor_id='your_actor_id',
    data=data
)
print('Actor run result:', run)

Error Handling

The SDK throws ScrapelessError for API-related errors:

from scrapeless import Scrapeless, ScrapelessError

client = Scrapeless()
try:
    result = client.scraping.scrape({'url': 'invalid-url'})
except ScrapelessError as error:
    print(f"Scrapeless API error: {error}")
    if hasattr(error, 'status_code'):
        print(f"Status code: {error.status_code}")

🔧 API Reference

Client Configuration

from scrapeless.types import ScrapelessConfig 

config = ScrapelessConfig(
    api_key='', # Your api key
    timeout=30000, # Request timeout in milliseconds (default: 30000)
    base_api_url='', # Base API URL
    actor_api_url='', # Actor service URL
    storage_api_url='', # Storage service URL
    browser_api_url='', # Browser service URL
    scraping_crawl_api_url='' # Crawl service URL
)

Available Services

The SDK provides the following services through the main client:

client.browser - browser automation with Playwright/Pyppeteer support, anti-detection tools (fingerprinting, CAPTCHA solving), and extensible workflows.
client.universal - JS rendering, user simulation (clicks/scrolls), anti-block bypass, and structured data export.
client.scraping_crawl - Recursive site crawling with multi-format export (Markdown, JSON, HTML, screenshots, links).
client.scraping - Pre-built connectors for sites (e.g., e-commerce, travel) to extract product data, pricing, and reviews.
client.deepserp - Search engine results extraction
client.proxies - Proxy management
client.actor - Scalable workflow automation with built-in scheduling and resource management.
client.storage - Data storage solutions

📚 Examples

Check out the examples directory for comprehensive usage examples:

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

📞 Support

📖 Documentation: https://docs.scrapeless.com
💬 Community: Join our Discord
🐛 Issues: GitHub Issues
📧 Email: support@scrapeless.com

🏢 About Scrapeless

Scrapeless is a powerful web scraping and browser automation platform that helps businesses extract data from any website at scale. Our platform provides:

High-performance web scraping infrastructure
Global proxy network
Browser automation capabilities
Enterprise-grade reliability and support

Visit scrapeless.com to learn more and get started.

Made with ❤️ by the Scrapeless team

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

scrapeless_team

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

1.2.1

Jul 23, 2025

1.2.0

Jul 17, 2025

1.1.1

Jul 16, 2025

1.1.0

Jul 14, 2025

1.0.4

Jul 8, 2025

1.0.3

Jul 7, 2025

1.0.2

Jul 7, 2025

1.0.1

Jul 7, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapeless-1.2.1.tar.gz (26.4 kB view details)

Uploaded Jul 23, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

scrapeless-1.2.1-py3-none-any.whl (39.9 kB view details)

Uploaded Jul 23, 2025 Python 3

File details

Details for the file scrapeless-1.2.1.tar.gz.

File metadata

Download URL: scrapeless-1.2.1.tar.gz
Upload date: Jul 23, 2025
Size: 26.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for scrapeless-1.2.1.tar.gz
Algorithm	Hash digest
SHA256	`152aa452a157ef4d5dcdfe610cf10f8e76f8c0804823a4a731933aff36b0984b`
MD5	`8cfe87f616e1917d996a2ad65d9e13cd`
BLAKE2b-256	`64fe8a1f2c10bd65c287d27a7c1d2dc43598ce8ccbc53f868c7d2e22b1524bcd`

See more details on using hashes here.

Provenance

The following attestation bundles were made for scrapeless-1.2.1.tar.gz:

Publisher: publish.yml on scrapeless-ai/sdk-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: scrapeless-1.2.1.tar.gz
- Subject digest: 152aa452a157ef4d5dcdfe610cf10f8e76f8c0804823a4a731933aff36b0984b
- Sigstore transparency entry: 304992540
- Sigstore integration time: Jul 23, 2025
Source repository:
- Permalink: scrapeless-ai/sdk-python@46aab6f7d175484507c4183b6fbaac5fbeb30fad
- Branch / Tag: refs/tags/v1.2.1
- Owner: https://github.com/scrapeless-ai
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@46aab6f7d175484507c4183b6fbaac5fbeb30fad
- Trigger Event: release

File details

Details for the file scrapeless-1.2.1-py3-none-any.whl.

File metadata

Download URL: scrapeless-1.2.1-py3-none-any.whl
Upload date: Jul 23, 2025
Size: 39.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for scrapeless-1.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`67d8f33ceee4c59f3d5725466152ff52718f88b6aa27d5df1c3f8f41063b8ca6`
MD5	`ee9f516b8a33827a2b24c8edc58080b7`
BLAKE2b-256	`005fe512f9b06759ce1c130a7b8d6a7db87ad4906f813c6138484815ea23bc50`

See more details on using hashes here.

Provenance

The following attestation bundles were made for scrapeless-1.2.1-py3-none-any.whl:

Publisher: publish.yml on scrapeless-ai/sdk-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: scrapeless-1.2.1-py3-none-any.whl
- Subject digest: 67d8f33ceee4c59f3d5725466152ff52718f88b6aa27d5df1c3f8f41063b8ca6
- Sigstore transparency entry: 304992556
- Sigstore integration time: Jul 23, 2025
Source repository:
- Permalink: scrapeless-ai/sdk-python@46aab6f7d175484507c4183b6fbaac5fbeb30fad
- Branch / Tag: refs/tags/v1.2.1
- Owner: https://github.com/scrapeless-ai
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@46aab6f7d175484507c4183b6fbaac5fbeb30fad
- Trigger Event: release

scrapeless 1.2.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Project description

Scrapeless Python SDK

📑 Table of Contents

🌟 Features

📦 Installation

🚀 Quick Start

Prerequisite

Basic Setup

Environment Variables

📖 Usage Examples

Browser

Crawl

Scraping API

Deep SerpApi

Actor

Error Handling

🔧 API Reference

Client Configuration

Available Services

📚 Examples

📄 License

📞 Support

🏢 About Scrapeless

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance