A Python library for easily interacting with Skrape.ai API with type-safe schemas using Pydantic

These details have not been verified by PyPI

Project description

skrape-py

A Python library for easily interacting with Skrape.ai API. Define your scraping schema using Pydantic and get type-safe results.

Features

🛡️ Type-safe: Define your schemas using Pydantic and get fully typed results
🚀 Simple API: Just define a schema and get your data
🔄 Async Support: Built with async/await for efficient scraping
🧩 Minimal Dependencies: Built on top of proven libraries like Pydantic and httpx
📝 Markdown Conversion: Convert any webpage to clean markdown
🕷️ Web Crawling: Crawl multiple pages with browser automation
🔄 Background Jobs: Handle long-running tasks asynchronously

Installation

pip install skrape-py

Or with Poetry:

poetry add skrape-py

Environment Setup

Setup your API key in .env:

SKRAPE_API_KEY="your_api_key_here"

Get your API key on Skrape.ai

Quick Start

Extract Structured Data

from skrape import Skrape
from pydantic import BaseModel
from typing import List
import os
import asyncio

# Define your schema using Pydantic
class ProductSchema(BaseModel):
    title: str
    price: float
    description: str
    rating: float

async def main():
    async with Skrape(api_key=os.getenv("SKRAPE_API_KEY")) as skrape:
        # Start extraction job
        job = await skrape.extract(
            "https://example.com/product",
            ProductSchema,
            {"renderJs": True}  # Enable JavaScript rendering if needed
        )
        
        # Wait for job to complete and get results
        while job.status != "COMPLETED":
            job = await skrape.get_job(job.jobId)
            await asyncio.sleep(1)
        
        # Access the extracted data
        product = job.result
        print(f"Product: {product.title}")
        print(f"Price: ${product.price}")

asyncio.run(main())

Convert to Markdown

# Single URL
response = await skrape.to_markdown(
    "https://example.com/article",
    {"renderJs": True}
)
print(response.result)  # Clean markdown content

# Multiple URLs (async)
job = await skrape.to_markdown_bulk(
    ["https://example.com/1", "https://example.com/2"],
    {"renderJs": True}
)

# Get results when ready
while job.status != "COMPLETED":
    job = await skrape.get_job(job.jobId)
    await asyncio.sleep(1)

for markdown in job.result:
    print(markdown)

Web Crawling

# Start crawling job
job = await skrape.crawl(
    ["https://example.com", "https://example.com/page2"],
    {
        "renderJs": True,
        "actions": [
            {"scroll": {"distance": 500}},  # Scroll down 500px
            {"wait_for": ".content"}  # Wait for content to load
        ]
    }
)

# Get results when ready
while job.status != "COMPLETED":
    job = await skrape.get_job(job.jobId)
    await asyncio.sleep(1)

for page in job.result:
    print(page)

API Options

Common options for all endpoints:

options = {
    "renderJs": True,  # Enable JavaScript rendering
    "actions": [
        {"click": {"selector": ".button"}},  # Click element
        {"scroll": {"distance": 500}},       # Scroll page
        {"wait_for": ".content"},           # Wait for element
        {"type": {                          # Type into input
            "selector": "input",
            "text": "search term"
        }}
    ],
    "callbackUrl": "https://your-server.com/webhook"  # For async jobs
}

Error Handling

The library provides typed exceptions for better error handling:

from skrape import Skrape, SkrapeValidationError, SkrapeAPIError

async with Skrape(api_key=os.getenv("SKRAPE_API_KEY")) as skrape:
    try:
        response = await skrape.extract(url, schema)
    except SkrapeValidationError as e:
        print(f"Data doesn't match schema: {e}")
    except SkrapeAPIError as e:
        print(f"API error: {e}")

Rate Limiting

The API response includes rate limit information that you can use to manage your requests:

response = await skrape.to_markdown(url)
usage = response.usage

print(f"Remaining credits: {usage.remaining}")
print(f"Rate limit info:")
print(f"  - Remaining: {usage.rateLimit.remaining}")
print(f"  - Base limit: {usage.rateLimit.baseLimit}")
print(f"  - Burst limit: {usage.rateLimit.burstLimit}")
print(f"  - Reset at: {usage.rateLimit.reset}")

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

1.0.3

Jan 29, 2025

1.0.1

Dec 8, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

skrape_py-1.0.3.tar.gz (4.3 kB view details)

Uploaded Jan 29, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

skrape_py-1.0.3-py3-none-any.whl (5.3 kB view details)

Uploaded Jan 29, 2025 Python 3

File details

Details for the file skrape_py-1.0.3.tar.gz.

File metadata

Download URL: skrape_py-1.0.3.tar.gz
Upload date: Jan 29, 2025
Size: 4.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.0.1 CPython/3.9.21 Linux/6.8.0-1020-azure

File hashes

Hashes for skrape_py-1.0.3.tar.gz
Algorithm	Hash digest
SHA256	`864a48f032bce9d5ac3dcc24a896aebe0ff6275ab2c38b877eefef69994fc992`
MD5	`f09c72bdacc43f5345f97e387404d07b`
BLAKE2b-256	`c9175f081625996d4606c30012ac4000977037c9699658c32c821bbe77e6217a`

See more details on using hashes here.

File details

Details for the file skrape_py-1.0.3-py3-none-any.whl.

File metadata

Download URL: skrape_py-1.0.3-py3-none-any.whl
Upload date: Jan 29, 2025
Size: 5.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.0.1 CPython/3.9.21 Linux/6.8.0-1020-azure

File hashes

Hashes for skrape_py-1.0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`07fd7f2062887a8f2a3cc300c1e67eadb797a1f2eb549dbef35c406375db58a7`
MD5	`d3340fa6ea84f37c317c13fe33991f01`
BLAKE2b-256	`9ca611593f99995c88d7b048225b5ff08f1bacce363388c4b79328649a0476b9`

See more details on using hashes here.

skrape-py 1.0.3

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

skrape-py

Features

Installation

Environment Setup

Quick Start

Extract Structured Data

Convert to Markdown

Web Crawling

API Options

Error Handling

Rate Limiting

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes