Skip to main content

Official Python SDK for Krawly — AI-powered web scraping platform

Project description

Krawly — AI-Powered Web Scraping SDK

PyPI version Python 3.8+ License: MIT

Turn any website into structured data with AI. No complex selectors, no external API keys — just describe what you want in plain English.

Installation

pip install krawly

Quick Start

from krawly import Krawly

# Initialize with your API key (get one at https://krawly.io)
client = Krawly(api_key="sai_your_key_here")

# One-line scraping — the simplest way
result = client.scrape(
    "https://books.toscrape.com",
    "Get all book titles and prices"
)

for item in result.data:
    print(f"{item['title']}: {item['price']}")

print(f"Total: {result.row_count} items")

Features

  • 🤖 AI-Powered — Describe what you want in plain English, the AI handles the rest
  • 🔧 No External Keys — Only your Krawly API key needed, no Claude/OpenAI keys
  • 📦 Config Management — Save, list, download, and reuse scraping configs
  • 🚀 Server Execution — Run scrapers on Krawly's cloud infrastructure
  • 📁 Local YAML — Read, write, and upload YAML configs from local files
  • 📊 Progress Tracking — Real-time progress callbacks during scraping

Usage

One-Line Scraping

result = client.scrape("https://example.com/products", "Get product names, prices, and ratings")
print(result.data)  # [{"name": "...", "price": "...", "rating": "..."}]

Step-by-Step Control

# Step 1: Generate a config
job = client.generate("https://example.com/products", "Get all product details")

# Step 2: Wait with progress updates
def on_progress(status):
    print(f"[{status.progress}%] {status.status_message}")

final = client.wait_for_completion(job.job_id, on_progress=on_progress)
print(f"Config generated: {final.config_name}")
print(final.yaml_content)

# Step 3: Run the scraper
run = client.run(final.config_id)
result = client.wait_and_get_results(run.job_id)
print(f"Scraped {result.row_count} items")

Config Management

# List all your configs
configs = client.list_configs()
for c in configs:
    print(f"{c.name}{c.target_url}")

# Get a specific config
config = client.get_config("config-uuid-here")
print(config.yaml_content)

# Create a new config
config = client.create_config(
    name="My Scraper",
    target_url="https://example.com",
    prompt="Get all items",
    yaml_content="url: https://example.com\n..."
)

# Delete a config
client.delete_config("config-uuid-here")

Local YAML Files

# Read a local YAML file and run it on the server
result = client.scrape_with_file("my_config.yaml")
print(result.data)

# Download a config from server to local file
client.download_config("config-uuid-here", "downloaded_config.yaml")

# Upload a local YAML file to the server
config = client.upload_config("my_config.yaml", name="My Config")
print(f"Uploaded as: {config.id}")

# Load and parse YAML locally
content = Krawly.load_yaml("config.yaml")
parsed = Krawly.parse_yaml(content)

Run YAML Content Directly

yaml_content = \"""
url: https://books.toscrape.com
selectors:
  items: article.product_pod
  fields:
    title: h3 a::attr(title)
    price: .price_color::text
\"""

result = client.scrape_with_yaml(yaml_content)
for book in result.data:
    print(book)

Account Info

info = client.me()
print(f"Plan: {info.plan}")
print(f"Credits remaining: {info.generations_remaining}/{info.generations_limit}")

Error Handling

from krawly import Krawly
from krawly.client import AuthenticationError, QuotaExceededError, RateLimitError, KrawlyError

try:
    result = client.scrape("https://example.com", "Get data")
except AuthenticationError:
    print("Invalid API key")
except QuotaExceededError:
    print("No credits remaining — upgrade your plan")
except RateLimitError:
    print("Too many requests — try again later")
except KrawlyError as e:
    print(f"API error: {e}")

Plans & Pricing

Plan Credits Server Execution Price
Free 3/month $0
Starter 20/month $15/mo
Pro 100/month $29/mo

All plans include API, SDK, and Chrome Extension access.

Get your API key at krawly.io

Documentation

Full documentation: docs.krawly.io

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

krawly-1.0.1.tar.gz (11.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

krawly-1.0.1-py3-none-any.whl (10.3 kB view details)

Uploaded Python 3

File details

Details for the file krawly-1.0.1.tar.gz.

File metadata

  • Download URL: krawly-1.0.1.tar.gz
  • Upload date:
  • Size: 11.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for krawly-1.0.1.tar.gz
Algorithm Hash digest
SHA256 13c655614deeaa27566affbbbfdb73403fdc8a33103795e9281c13afb5a3a161
MD5 763b4356526c9c1d08eee362c2ccb5d9
BLAKE2b-256 054a3d3f1fac40760f2e561d7f950d15ac05fcf730ee6a21f44cf3de3933773b

See more details on using hashes here.

File details

Details for the file krawly-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: krawly-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 10.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for krawly-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b1621a4b936b110b9e9eef3a28a220cbf90f0cf472ee5e3a307a5eea0e780fd5
MD5 d9b063ae34bdb443c2d10d5430b3f87d
BLAKE2b-256 818a02c300dca4c406529a78af60046364f97e0efa2c4629b3af9730617bef42

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page