Skip to main content

A Python library for easily interacting with Skrape.ai API with type-safe schemas using Pydantic

Project description

skrape-py

A Python library for easily interacting with Skrape.ai API. Define your scraping schema using Pydantic and get type-safe results.

Features

  • 🛡️ Type-safe: Define your schemas using Pydantic and get fully typed results
  • 🚀 Simple API: Just define a schema and get your data
  • 🔄 Async Support: Built with async/await for efficient scraping
  • 🧩 Minimal Dependencies: Built on top of proven libraries like Pydantic and httpx
  • 📝 Markdown Conversion: Convert any webpage to clean markdown
  • 🕷️ Web Crawling: Crawl multiple pages with browser automation
  • 🔄 Background Jobs: Handle long-running tasks asynchronously

Installation

pip install skrape-py

Or with Poetry:

poetry add skrape-py

Environment Setup

Setup your API key in .env:

SKRAPE_API_KEY="your_api_key_here"

Get your API key on Skrape.ai

Quick Start

Extract Structured Data

from skrape import Skrape
from pydantic import BaseModel
from typing import List
import os
import asyncio

# Define your schema using Pydantic
class ProductSchema(BaseModel):
    title: str
    price: float
    description: str
    rating: float

async def main():
    async with Skrape(api_key=os.getenv("SKRAPE_API_KEY")) as skrape:
        # Start extraction job
        job = await skrape.extract(
            "https://example.com/product",
            ProductSchema,
            {"renderJs": True}  # Enable JavaScript rendering if needed
        )
        
        # Wait for job to complete and get results
        while job.status != "COMPLETED":
            job = await skrape.get_job(job.jobId)
            await asyncio.sleep(1)
        
        # Access the extracted data
        product = job.result
        print(f"Product: {product.title}")
        print(f"Price: ${product.price}")

asyncio.run(main())

Convert to Markdown

# Single URL
response = await skrape.to_markdown(
    "https://example.com/article",
    {"renderJs": True}
)
print(response.result)  # Clean markdown content

# Multiple URLs (async)
job = await skrape.to_markdown_bulk(
    ["https://example.com/1", "https://example.com/2"],
    {"renderJs": True}
)

# Get results when ready
while job.status != "COMPLETED":
    job = await skrape.get_job(job.jobId)
    await asyncio.sleep(1)

for markdown in job.result:
    print(markdown)

Web Crawling

# Start crawling job
job = await skrape.crawl(
    ["https://example.com", "https://example.com/page2"],
    {
        "renderJs": True,
        "actions": [
            {"scroll": {"distance": 500}},  # Scroll down 500px
            {"wait_for": ".content"}  # Wait for content to load
        ]
    }
)

# Get results when ready
while job.status != "COMPLETED":
    job = await skrape.get_job(job.jobId)
    await asyncio.sleep(1)

for page in job.result:
    print(page)

API Options

Common options for all endpoints:

options = {
    "renderJs": True,  # Enable JavaScript rendering
    "actions": [
        {"click": {"selector": ".button"}},  # Click element
        {"scroll": {"distance": 500}},       # Scroll page
        {"wait_for": ".content"},           # Wait for element
        {"type": {                          # Type into input
            "selector": "input",
            "text": "search term"
        }}
    ],
    "callbackUrl": "https://your-server.com/webhook"  # For async jobs
}

Error Handling

The library provides typed exceptions for better error handling:

from skrape import Skrape, SkrapeValidationError, SkrapeAPIError

async with Skrape(api_key=os.getenv("SKRAPE_API_KEY")) as skrape:
    try:
        response = await skrape.extract(url, schema)
    except SkrapeValidationError as e:
        print(f"Data doesn't match schema: {e}")
    except SkrapeAPIError as e:
        print(f"API error: {e}")

Rate Limiting

The API response includes rate limit information that you can use to manage your requests:

response = await skrape.to_markdown(url)
usage = response.usage

print(f"Remaining credits: {usage.remaining}")
print(f"Rate limit info:")
print(f"  - Remaining: {usage.rateLimit.remaining}")
print(f"  - Base limit: {usage.rateLimit.baseLimit}")
print(f"  - Burst limit: {usage.rateLimit.burstLimit}")
print(f"  - Reset at: {usage.rateLimit.reset}")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

skrape_py-1.0.3.tar.gz (4.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

skrape_py-1.0.3-py3-none-any.whl (5.3 kB view details)

Uploaded Python 3

File details

Details for the file skrape_py-1.0.3.tar.gz.

File metadata

  • Download URL: skrape_py-1.0.3.tar.gz
  • Upload date:
  • Size: 4.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.0.1 CPython/3.9.21 Linux/6.8.0-1020-azure

File hashes

Hashes for skrape_py-1.0.3.tar.gz
Algorithm Hash digest
SHA256 864a48f032bce9d5ac3dcc24a896aebe0ff6275ab2c38b877eefef69994fc992
MD5 f09c72bdacc43f5345f97e387404d07b
BLAKE2b-256 c9175f081625996d4606c30012ac4000977037c9699658c32c821bbe77e6217a

See more details on using hashes here.

File details

Details for the file skrape_py-1.0.3-py3-none-any.whl.

File metadata

  • Download URL: skrape_py-1.0.3-py3-none-any.whl
  • Upload date:
  • Size: 5.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.0.1 CPython/3.9.21 Linux/6.8.0-1020-azure

File hashes

Hashes for skrape_py-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 07fd7f2062887a8f2a3cc300c1e67eadb797a1f2eb549dbef35c406375db58a7
MD5 d3340fa6ea84f37c317c13fe33991f01
BLAKE2b-256 9ca611593f99995c88d7b048225b5ff08f1bacce363388c4b79328649a0476b9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page