A Python library for easily interacting with Skrape.ai API with type-safe schemas using Pydantic
Project description
skrape-py
A Python library for easily interacting with Skrape.ai API. Define your scraping schema using Pydantic and get type-safe results.
Features
- 🛡️ Type-safe: Define your schemas using Pydantic and get fully typed results
- 🚀 Simple API: Just define a schema and get your data
- 🔄 Async Support: Built with async/await for efficient scraping
- 🧩 Minimal Dependencies: Built on top of proven libraries like Pydantic and httpx
- 📝 Markdown Conversion: Convert any webpage to clean markdown
- 🕷️ Web Crawling: Crawl multiple pages with browser automation
- 🔄 Background Jobs: Handle long-running tasks asynchronously
Installation
pip install skrape-py
Or with Poetry:
poetry add skrape-py
Environment Setup
Setup your API key in .env:
SKRAPE_API_KEY="your_api_key_here"
Get your API key on Skrape.ai
Quick Start
Extract Structured Data
from skrape import Skrape
from pydantic import BaseModel
from typing import List
import os
import asyncio
# Define your schema using Pydantic
class ProductSchema(BaseModel):
title: str
price: float
description: str
rating: float
async def main():
async with Skrape(api_key=os.getenv("SKRAPE_API_KEY")) as skrape:
# Start extraction job
job = await skrape.extract(
"https://example.com/product",
ProductSchema,
{"renderJs": True} # Enable JavaScript rendering if needed
)
# Wait for job to complete and get results
while job.status != "COMPLETED":
job = await skrape.get_job(job.jobId)
await asyncio.sleep(1)
# Access the extracted data
product = job.result
print(f"Product: {product.title}")
print(f"Price: ${product.price}")
asyncio.run(main())
Convert to Markdown
# Single URL
response = await skrape.to_markdown(
"https://example.com/article",
{"renderJs": True}
)
print(response.result) # Clean markdown content
# Multiple URLs (async)
job = await skrape.to_markdown_bulk(
["https://example.com/1", "https://example.com/2"],
{"renderJs": True}
)
# Get results when ready
while job.status != "COMPLETED":
job = await skrape.get_job(job.jobId)
await asyncio.sleep(1)
for markdown in job.result:
print(markdown)
Web Crawling
# Start crawling job
job = await skrape.crawl(
["https://example.com", "https://example.com/page2"],
{
"renderJs": True,
"actions": [
{"scroll": {"distance": 500}}, # Scroll down 500px
{"wait_for": ".content"} # Wait for content to load
]
}
)
# Get results when ready
while job.status != "COMPLETED":
job = await skrape.get_job(job.jobId)
await asyncio.sleep(1)
for page in job.result:
print(page)
API Options
Common options for all endpoints:
options = {
"renderJs": True, # Enable JavaScript rendering
"actions": [
{"click": {"selector": ".button"}}, # Click element
{"scroll": {"distance": 500}}, # Scroll page
{"wait_for": ".content"}, # Wait for element
{"type": { # Type into input
"selector": "input",
"text": "search term"
}}
],
"callbackUrl": "https://your-server.com/webhook" # For async jobs
}
Error Handling
The library provides typed exceptions for better error handling:
from skrape import Skrape, SkrapeValidationError, SkrapeAPIError
async with Skrape(api_key=os.getenv("SKRAPE_API_KEY")) as skrape:
try:
response = await skrape.extract(url, schema)
except SkrapeValidationError as e:
print(f"Data doesn't match schema: {e}")
except SkrapeAPIError as e:
print(f"API error: {e}")
Rate Limiting
The API response includes rate limit information that you can use to manage your requests:
response = await skrape.to_markdown(url)
usage = response.usage
print(f"Remaining credits: {usage.remaining}")
print(f"Rate limit info:")
print(f" - Remaining: {usage.rateLimit.remaining}")
print(f" - Base limit: {usage.rateLimit.baseLimit}")
print(f" - Burst limit: {usage.rateLimit.burstLimit}")
print(f" - Reset at: {usage.rateLimit.reset}")
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file skrape_py-1.0.3.tar.gz.
File metadata
- Download URL: skrape_py-1.0.3.tar.gz
- Upload date:
- Size: 4.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.0.1 CPython/3.9.21 Linux/6.8.0-1020-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
864a48f032bce9d5ac3dcc24a896aebe0ff6275ab2c38b877eefef69994fc992
|
|
| MD5 |
f09c72bdacc43f5345f97e387404d07b
|
|
| BLAKE2b-256 |
c9175f081625996d4606c30012ac4000977037c9699658c32c821bbe77e6217a
|
File details
Details for the file skrape_py-1.0.3-py3-none-any.whl.
File metadata
- Download URL: skrape_py-1.0.3-py3-none-any.whl
- Upload date:
- Size: 5.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.0.1 CPython/3.9.21 Linux/6.8.0-1020-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
07fd7f2062887a8f2a3cc300c1e67eadb797a1f2eb549dbef35c406375db58a7
|
|
| MD5 |
d3340fa6ea84f37c317c13fe33991f01
|
|
| BLAKE2b-256 |
9ca611593f99995c88d7b048225b5ff08f1bacce363388c4b79328649a0476b9
|