An ultimate stealth scraping library with advanced proxy rotation and auto-parsing.

These details have not been verified by PyPI

Project links

Project description

Scrawlee

An ultimate stealth scraping library built on top of curl_cffi with advanced proxy rotation, auto-parsing for JSON/HTML, and built-in rate-limiting retries. Fully supports both synchronous AND highly-concurrent asynchronous scraping.

Key Features

Ultimate Stealth: Rotates through real-world TLS/JA3 fingerprints (Chrome, Edge, Safari).
Asynchronous Engine: Comes with AsyncScrawleeClient for blazing fast, highly-concurrent scraping using asyncio.
Auto-Parsing Response: The .auto property automatically returns a parsed Python dictionary or a high-speed selectolax object depending on whether the response is JSON or HTML.
Dual-Parser Support: Fetch lightning-fast CSS queries via .html (Selectolax) or utilize robust XPath querying via .lxml (lxml).
Cookie Persistence: Instantly save and load authenticated sessions to disk so you never have to log in or solve Cloudflare challenges twice.
Smart Retries: Built-in exponential backoff for common HTTP error codes (429, 50x).
Advanced Proxy Management: Supports Random, Round-Robin, and Sticky session rotation with off-band automated health checks.

Installation

pip install scrawlee

(Requires Python 3.8+)

Usage Guide

1. Basic Synchronous Scraping

from scrawlee import ScrawleeClient

with ScrawleeClient(impersonate="chrome120") as client:
    res = client.get("https://httpbin.org/get")
    
    # .auto magically returns a Dictionary for JSON API responses!
    print(res.auto['headers']['User-Agent'])
    
    res_html = client.get("https://httpbin.org/html")
    
    # Lightning fast CSS queries via selectolax
    print(res_html.html.css_first("h1").text(strip=True))
    
    # Powerful XPath queries via lxml
    print(res_html.lxml.xpath("//h1/text()")[0])

2. Deep Dive: Extracting Data from HTML

Scrawlee eliminates the need for external parsing libraries like BeautifulSoup. It comes natively packed with two blazing-fast, C-based parsing engines:

Extracting with CSS Selectors (via `.html`)

The .html property exposes the selectolax engine. It is the fastest way to parse data using standard CSS selectors.

with ScrawleeClient() as client:
    res = client.get("https://example-store.com/products")
    
    # 1. Extract text from a single element
    title = res.html.css_first("h1.product-title").text(strip=True)
    
    # 2. Extract HTML attributes (e.g. data-id, href, src)
    product_id = res.html.css_first("div.product").attributes.get("data-product-id")
    
    # 3. Loop through lists of elements
    for feature_li in res.html.css("ul.features li"):
        print("Feature:", feature_li.text(strip=True))

Extracting with XPath Queries (via `.lxml`)

If you need complex DOM traversal (e.g., finding a parent element based on its child's value), CSS selectors fall short. The .lxml property provides industry-standard XPath extraction.

with ScrawleeClient() as client:
    res = client.get("https://example-store.com/products")
    
    # Fetch an element exactly using an XPath query
    price = res.lxml.xpath('//div[@class="product-card" and @data-status="in-stock"]//span[@class="price"]/text()')[0]
    print(f"Price is: {price}")

3. High-Speed Asynchronous Scraping

If you need to scrape 1,000 pages concurrently, use AsyncScrawleeClient.

import asyncio
from scrawlee import AsyncScrawleeClient

async def run():
    async with AsyncScrawleeClient() as client:
        # Fire concurrent requests
        res1, res2 = await asyncio.gather(
            client.get("https://httpbin.org/get"),
            client.get("https://httpbin.org/html")
        )
        print("Async HTTPBin Status:", res1.status_code)

asyncio.run(run())

4. Persistent Sessions (Save/Load Cookies)

If you bypass a Datadome/Cloudflare wall or log into a website, save your cookies to disk so you can instantly resume the session tomorrow!

from scrawlee import ScrawleeClient

# Script 1: Save the session
with ScrawleeClient() as client:
    # ... Login logic or bypass challenge ...
    client.save_cookies("twitter_session.json")

# Script 2: Load the session instantly
with ScrawleeClient() as client:
    client.load_cookies("twitter_session.json")
    res = client.get("https://api.twitter.com/protected_route")

5. Advanced Proxy Management

Automatically rotates Proxies and quarantines failing ones.

from scrawlee import ScrawleeClient, ProxyManager

pm = ProxyManager(rotation_strategy="round_robin")
# Accepts raw proxy data
pm.add_proxy(ip="12.34.56.78", port="8080", username="user", password="pwd")

with ScrawleeClient(proxy_manager=pm) as client:
    res = client.get("https://api.myip.com")
    print("Masked IP:", res.auto['ip'])

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

2.5.0

May 3, 2026

2.0.0

Apr 28, 2026

This version

0.1.0

Mar 11, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrawlee-0.1.0.tar.gz (8.0 kB view details)

Uploaded Mar 11, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

scrawlee-0.1.0-py3-none-any.whl (9.6 kB view details)

Uploaded Mar 11, 2026 Python 3

File details

Details for the file scrawlee-0.1.0.tar.gz.

File metadata

Download URL: scrawlee-0.1.0.tar.gz
Upload date: Mar 11, 2026
Size: 8.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for scrawlee-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`6f75173cc7f3899502f16d88a960930e8d03b58db42fc62fe9f80986249635bd`
MD5	`47060608bc989351786fddc28d11d129`
BLAKE2b-256	`ff05bba1b3f93df253f78dd360357ccf5d94a3c1e31354e9fda7d0c1f0cbe086`

See more details on using hashes here.

File details

Details for the file scrawlee-0.1.0-py3-none-any.whl.

File metadata

Download URL: scrawlee-0.1.0-py3-none-any.whl
Upload date: Mar 11, 2026
Size: 9.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for scrawlee-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0133a6b875e6d87ba59b4d544d20e69deb48776248a14553e87044ca0e05ded0`
MD5	`702545a7b3e9d22764ac8f847012b87a`
BLAKE2b-256	`7229a54ee446056c558af63c7da28d3ef16dff279cba8e360f08fc3dcb7fcafd`

See more details on using hashes here.

scrawlee 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Scrawlee

Key Features

Installation

Usage Guide

1. Basic Synchronous Scraping

2. Deep Dive: Extracting Data from HTML

Extracting with CSS Selectors (via `.html`)

Extracting with XPath Queries (via `.lxml`)

3. High-Speed Asynchronous Scraping

4. Persistent Sessions (Save/Load Cookies)

5. Advanced Proxy Management

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

scrawlee 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Scrawlee

Key Features

Installation

Usage Guide

1. Basic Synchronous Scraping

2. Deep Dive: Extracting Data from HTML

Extracting with CSS Selectors (via .html)

Extracting with XPath Queries (via .lxml)

3. High-Speed Asynchronous Scraping

4. Persistent Sessions (Save/Load Cookies)

5. Advanced Proxy Management

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Extracting with CSS Selectors (via `.html`)

Extracting with XPath Queries (via `.lxml`)