Skip to main content

Official Python SDK for Krawly — AI-powered web scraping platform

Project description

Krawly — AI-Powered Web Scraping SDK

PyPI version Python 3.8+ License: MIT

Turn any website into structured data with AI. No complex selectors, no external API keys — just describe what you want in plain English.

Installation

pip install krawly

Quick Start

from krawly import Krawly

# Initialize with your API key (get one at https://krawly.io)
client = Krawly(api_key="sai_your_key_here")

# One-line scraping — the simplest way
result = client.scrape(
    "https://books.toscrape.com",
    "Get all book titles and prices"
)

for item in result.data:
    print(f"{item['title']}: {item['price']}")

print(f"Total: {result.row_count} items")

Features

  • 🤖 AI-Powered — Describe what you want in plain English, the AI handles the rest
  • 🔧 No External Keys — Only your Krawly API key needed, no Claude/OpenAI keys
  • 📦 Config Management — Save, list, download, and reuse scraping configs
  • 🚀 Server Execution — Run scrapers on Krawly's cloud infrastructure
  • 📁 Local YAML — Read, write, and upload YAML configs from local files
  • 📊 Progress Tracking — Real-time progress callbacks during scraping

Usage

One-Line Scraping

result = client.scrape("https://example.com/products", "Get product names, prices, and ratings")
print(result.data)  # [{"name": "...", "price": "...", "rating": "..."}]

Step-by-Step Control

# Step 1: Generate a config
job = client.generate("https://example.com/products", "Get all product details")

# Step 2: Wait with progress updates
def on_progress(status):
    print(f"[{status.progress}%] {status.status_message}")

final = client.wait_for_completion(job.job_id, on_progress=on_progress)
print(f"Config generated: {final.config_name}")
print(final.yaml_content)

# Step 3: Run the scraper
run = client.run(final.config_id)
result = client.wait_and_get_results(run.job_id)
print(f"Scraped {result.row_count} items")

Config Management

# List all your configs
configs = client.list_configs()
for c in configs:
    print(f"{c.name}{c.target_url}")

# Get a specific config
config = client.get_config("config-uuid-here")
print(config.yaml_content)

# Create a new config
config = client.create_config(
    name="My Scraper",
    target_url="https://example.com",
    prompt="Get all items",
    yaml_content="url: https://example.com\n..."
)

# Delete a config
client.delete_config("config-uuid-here")

Local YAML Files

# Read a local YAML file and run it on the server
result = client.scrape_with_file("my_config.yaml")
print(result.data)

# Download a config from server to local file
client.download_config("config-uuid-here", "downloaded_config.yaml")

# Upload a local YAML file to the server
config = client.upload_config("my_config.yaml", name="My Config")
print(f"Uploaded as: {config.id}")

# Load and parse YAML locally
content = Krawly.load_yaml("config.yaml")
parsed = Krawly.parse_yaml(content)

Run YAML Content Directly

yaml_content = \"""
url: https://books.toscrape.com
selectors:
  items: article.product_pod
  fields:
    title: h3 a::attr(title)
    price: .price_color::text
\"""

result = client.scrape_with_yaml(yaml_content)
for book in result.data:
    print(book)

Account Info

info = client.me()
print(f"Plan: {info.plan}")
print(f"Credits remaining: {info.generations_remaining}/{info.generations_limit}")

Error Handling

from krawly import Krawly
from krawly.client import AuthenticationError, QuotaExceededError, RateLimitError, KrawlyError

try:
    result = client.scrape("https://example.com", "Get data")
except AuthenticationError:
    print("Invalid API key")
except QuotaExceededError:
    print("No credits remaining — upgrade your plan")
except RateLimitError:
    print("Too many requests — try again later")
except KrawlyError as e:
    print(f"API error: {e}")

Plans & Pricing

Plan Credits Server Execution Price
Free 3/month $0
Starter 20/month $15/mo
Pro 100/month $29/mo

All plans include API, SDK, and Chrome Extension access.

Get your API key at krawly.io

Documentation

Full documentation: docs.krawly.io

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

krawly-1.0.0.tar.gz (11.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

krawly-1.0.0-py3-none-any.whl (10.4 kB view details)

Uploaded Python 3

File details

Details for the file krawly-1.0.0.tar.gz.

File metadata

  • Download URL: krawly-1.0.0.tar.gz
  • Upload date:
  • Size: 11.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for krawly-1.0.0.tar.gz
Algorithm Hash digest
SHA256 ba54e19ee67bf10980148f770c418e3d766be41dc48ed3629a66490bce450c55
MD5 c672278bec3aec2089eddf671b30146b
BLAKE2b-256 ff461e3b341005d1674bd1b8628bb41e1082ace983a201161c40f829ce46195f

See more details on using hashes here.

File details

Details for the file krawly-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: krawly-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 10.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for krawly-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 dc584925049d642931267709219f7f0029704590931f5c35600eec593b9d9888
MD5 5ff604fbbd8f3e0cbcaa547135982548
BLAKE2b-256 c5c14d380a2333659c6c43fbed3b01bc32c15b13d5042de4672f22478bb3aef1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page