Skip to main content

Official Python SDK for Krawly — AI-powered web scraping platform

Project description

Krawly — AI-Powered Web Scraping SDK

PyPI version Python 3.8+ License: MIT

Turn any website into structured data with AI. No complex selectors, no external API keys — just describe what you want in plain English.

Installation

pip install krawly

Quick Start

from krawly import Krawly

# Initialize with your API key (get one at https://krawly.io)
client = Krawly(api_key="sai_your_key_here")

# One-line scraping — the simplest way
result = client.scrape(
    "https://books.toscrape.com",
    "Get all book titles and prices"
)

for item in result.data:
    print(f"{item['title']}: {item['price']}")

print(f"Total: {result.row_count} items")

Features

  • 🤖 AI-Powered — Describe what you want in plain English, the AI handles the rest
  • 🔧 No External Keys — Only your Krawly API key needed, no Claude/OpenAI keys
  • 📦 Config Management — Save, list, download, and reuse scraping configs
  • 🚀 Server Execution — Run scrapers on Krawly's cloud infrastructure
  • 📁 Local YAML — Read, write, and upload YAML configs from local files
  • 📊 Progress Tracking — Real-time progress callbacks during scraping

Usage

One-Line Scraping

result = client.scrape("https://example.com/products", "Get product names, prices, and ratings")
print(result.data)  # [{"name": "...", "price": "...", "rating": "..."}]

Step-by-Step Control

# Step 1: Generate a config
job = client.generate("https://example.com/products", "Get all product details")

# Step 2: Wait with progress updates
def on_progress(status):
    print(f"[{status.progress}%] {status.status_message}")

final = client.wait_for_completion(job.job_id, on_progress=on_progress)
print(f"Config generated: {final.config_name}")
print(final.yaml_content)

# Step 3: Run the scraper
run = client.run(final.config_id)
result = client.wait_and_get_results(run.job_id)
print(f"Scraped {result.row_count} items")

Config Management

# List all your configs
configs = client.list_configs()
for c in configs:
    print(f"{c.name}{c.target_url}")

# Get a specific config
config = client.get_config("config-uuid-here")
print(config.yaml_content)

# Create a new config
config = client.create_config(
    name="My Scraper",
    target_url="https://example.com",
    prompt="Get all items",
    yaml_content="url: https://example.com\n..."
)

# Delete a config
client.delete_config("config-uuid-here")

Local YAML Files

# Read a local YAML file and run it on the server
result = client.scrape_with_file("my_config.yaml")
print(result.data)

# Download a config from server to local file
client.download_config("config-uuid-here", "downloaded_config.yaml")

# Upload a local YAML file to the server
config = client.upload_config("my_config.yaml", name="My Config")
print(f"Uploaded as: {config.id}")

# Load and parse YAML locally
content = Krawly.load_yaml("config.yaml")
parsed = Krawly.parse_yaml(content)

Run YAML Content Directly

yaml_content = \"""
url: https://books.toscrape.com
selectors:
  items: article.product_pod
  fields:
    title: h3 a::attr(title)
    price: .price_color::text
\"""

result = client.scrape_with_yaml(yaml_content)
for book in result.data:
    print(book)

Account Info

info = client.me()
print(f"Plan: {info.plan}")
print(f"Credits remaining: {info.generations_remaining}/{info.generations_limit}")

Error Handling

from krawly import Krawly
from krawly.client import AuthenticationError, QuotaExceededError, RateLimitError, KrawlyError

try:
    result = client.scrape("https://example.com", "Get data")
except AuthenticationError:
    print("Invalid API key")
except QuotaExceededError:
    print("No credits remaining — upgrade your plan")
except RateLimitError:
    print("Too many requests — try again later")
except KrawlyError as e:
    print(f"API error: {e}")

Plans & Pricing

Plan Credits Server Execution Price
Free 3/month $0
Starter 20/month $15/mo
Pro 100/month $29/mo

All plans include API, SDK, and Chrome Extension access.

Get your API key at krawly.io

Documentation

Full documentation: docs.krawly.io

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

krawly-1.0.2.tar.gz (11.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

krawly-1.0.2-py3-none-any.whl (10.6 kB view details)

Uploaded Python 3

File details

Details for the file krawly-1.0.2.tar.gz.

File metadata

  • Download URL: krawly-1.0.2.tar.gz
  • Upload date:
  • Size: 11.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for krawly-1.0.2.tar.gz
Algorithm Hash digest
SHA256 9d79a374c011650c7f0fcae601142da614f7080feeab1cc1a9480225a5dcec63
MD5 e9bd1889c3318e7d41f6912375a03f26
BLAKE2b-256 23283893d59375337302ebd279a14f2d7bcb76874b0676a6ff580c289866f8c7

See more details on using hashes here.

File details

Details for the file krawly-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: krawly-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 10.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for krawly-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 554adcefed588615e97cf42141c12a812d8d4b4db70ee3e2ca17d314e635a08f
MD5 f7e49ef61a019331f7c3ed97452f29a1
BLAKE2b-256 fe2ac11ea86591202556d6c566f97588c52b877e4aa55ecf55d0701a9ed7cc60

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page