Skip to main content

Lightweight cloud SDK for Crawl4AI - mirrors the OSS API

Project description

Crawl4AI Cloud SDK for Python

Lightweight Python SDK for Crawl4AI Cloud. Mirrors the OSS API exactly.

Note: This SDK is for Crawl4AI Cloud (api.crawl4ai.com), the managed cloud service. For the self-hosted open-source version, see github.com/unclecode/crawl4ai.

PyPI version Python Version

Installation

pip install crawl4ai-cloud-sdk

Get Your API Key

  1. Go to api.crawl4ai.com
  2. Sign up and get your API key

Quick Start

import asyncio
from crawl4ai_cloud import AsyncWebCrawler

async def main():
    async with AsyncWebCrawler(api_key="sk_live_...") as crawler:
        result = await crawler.run("https://example.com")
        print(result.markdown.raw_markdown)

asyncio.run(main())

Features

Single URL Crawl

result = await crawler.run("https://example.com")
print(result.success)
print(result.markdown.raw_markdown)
print(result.html)

Batch Crawl

urls = ["https://example.com", "https://httpbin.org/html"]

# Wait for results
results = await crawler.run_many(urls, wait=True)
for r in results:
    print(f"{r.url}: {r.success}")

# Fire and forget (returns job)
job = await crawler.run_many(urls, wait=False)
print(f"Job ID: {job.id}")

Configuration

from crawl4ai_cloud import CrawlerRunConfig, BrowserConfig

config = CrawlerRunConfig(
    word_count_threshold=10,
    exclude_external_links=True,
    screenshot=True,
)

browser_config = BrowserConfig(
    viewport_width=1920,
    viewport_height=1080,
)

result = await crawler.run(
    "https://example.com",
    config=config,
    browser_config=browser_config,
)

Proxy Support

# Shorthand
result = await crawler.run(url, proxy="datacenter")
result = await crawler.run(url, proxy="residential")

# Full config
result = await crawler.run(url, proxy={
    "mode": "residential",
    "country": "US"
})

Deep Crawl

result = await crawler.deep_crawl(
    "https://docs.example.com",
    strategy="bfs",
    max_depth=2,
    max_urls=50,
    wait=True,
)

Job Management

# List jobs
jobs = await crawler.list_jobs(status="completed", limit=10)

# Get job status
job = await crawler.get_job(job_id)

# Wait for job
job = await crawler.wait_job(job_id, poll_interval=2.0)

# Cancel job
await crawler.cancel_job(job_id)

Migration from OSS

Zero learning curve — your existing code works:

# Before (OSS)
from crawl4ai import AsyncWebCrawler
async with AsyncWebCrawler() as crawler:
    result = await crawler.arun(url)

# After (Cloud)
from crawl4ai_cloud import AsyncWebCrawler
async with AsyncWebCrawler(api_key="sk_...") as crawler:
    result = await crawler.run(url)  # arun() also works!

Environment Variables

export CRAWL4AI_API_KEY=sk_live_...
# API key auto-loaded from environment
crawler = AsyncWebCrawler()

Error Handling

from crawl4ai_cloud import (
    CloudError,
    AuthenticationError,
    RateLimitError,
    QuotaExceededError,
    NotFoundError,
)

try:
    result = await crawler.run(url)
except AuthenticationError:
    print("Invalid API key")
except RateLimitError as e:
    print(f"Rate limited. Retry after {e.retry_after}s")
except QuotaExceededError:
    print("Quota exceeded")

Links

License

Apache 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crawl4ai_cloud_sdk-0.2.2.tar.gz (27.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

crawl4ai_cloud_sdk-0.2.2-py3-none-any.whl (21.5 kB view details)

Uploaded Python 3

File details

Details for the file crawl4ai_cloud_sdk-0.2.2.tar.gz.

File metadata

  • Download URL: crawl4ai_cloud_sdk-0.2.2.tar.gz
  • Upload date:
  • Size: 27.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for crawl4ai_cloud_sdk-0.2.2.tar.gz
Algorithm Hash digest
SHA256 40fcac52391b7b871f2d71e2c43226c96bec0f931f64ae65a71fb96761fa60b1
MD5 f27da6c6c760e8003fb21c209ecd9008
BLAKE2b-256 c57c3c6b1f21b622d70a33902ef11e1e22577372d0409fe37b092cae334617eb

See more details on using hashes here.

File details

Details for the file crawl4ai_cloud_sdk-0.2.2-py3-none-any.whl.

File metadata

File hashes

Hashes for crawl4ai_cloud_sdk-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 b976dfd913d290d35dedb545d769618e662372b3215a12715b2015ebec871368
MD5 be5bdc6e8afbfde8ccb356236f161755
BLAKE2b-256 32c0f72ea963b15dfebd764b1c69fd0431e62de436acf64dc18cff027d73223b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page