Skip to main content

Lightweight cloud SDK for Crawl4AI - mirrors the OSS API

Project description

Crawl4AI Cloud SDK for Python

Lightweight Python SDK for Crawl4AI Cloud. Mirrors the OSS API exactly.

Note: This SDK is for Crawl4AI Cloud (api.crawl4ai.com), the managed cloud service. For the self-hosted open-source version, see github.com/unclecode/crawl4ai.

PyPI version Python Version

Installation

# From PyPI (coming soon)
pip install crawl4ai-cloud

# From GitHub (available now)
pip install git+https://github.com/unclecode/crawl4ai-cloud-sdk.git#subdirectory=python

Get Your API Key

  1. Go to api.crawl4ai.com
  2. Sign up and get your API key

Quick Start

import asyncio
from crawl4ai_cloud import AsyncWebCrawler

async def main():
    async with AsyncWebCrawler(api_key="sk_live_...") as crawler:
        result = await crawler.run("https://example.com")
        print(result.markdown.raw_markdown)

asyncio.run(main())

Features

Single URL Crawl

result = await crawler.run("https://example.com")
print(result.success)
print(result.markdown.raw_markdown)
print(result.html)

Batch Crawl

urls = ["https://example.com", "https://httpbin.org/html"]

# Wait for results
results = await crawler.run_many(urls, wait=True)
for r in results:
    print(f"{r.url}: {r.success}")

# Fire and forget (returns job)
job = await crawler.run_many(urls, wait=False)
print(f"Job ID: {job.id}")

Configuration

from crawl4ai_cloud import CrawlerRunConfig, BrowserConfig

config = CrawlerRunConfig(
    word_count_threshold=10,
    exclude_external_links=True,
    screenshot=True,
)

browser_config = BrowserConfig(
    viewport_width=1920,
    viewport_height=1080,
)

result = await crawler.run(
    "https://example.com",
    config=config,
    browser_config=browser_config,
)

Proxy Support

# Shorthand
result = await crawler.run(url, proxy="datacenter")
result = await crawler.run(url, proxy="residential")

# Full config
result = await crawler.run(url, proxy={
    "mode": "residential",
    "country": "US"
})

Deep Crawl

result = await crawler.deep_crawl(
    "https://docs.example.com",
    strategy="bfs",
    max_depth=2,
    max_urls=50,
    wait=True,
)

Job Management

# List jobs
jobs = await crawler.list_jobs(status="completed", limit=10)

# Get job status
job = await crawler.get_job(job_id)

# Wait for job
job = await crawler.wait_job(job_id, poll_interval=2.0)

# Cancel job
await crawler.cancel_job(job_id)

Migration from OSS

Zero learning curve — your existing code works:

# Before (OSS)
from crawl4ai import AsyncWebCrawler
async with AsyncWebCrawler() as crawler:
    result = await crawler.arun(url)

# After (Cloud)
from crawl4ai_cloud import AsyncWebCrawler
async with AsyncWebCrawler(api_key="sk_...") as crawler:
    result = await crawler.run(url)  # arun() also works!

Environment Variables

export CRAWL4AI_API_KEY=sk_live_...
# API key auto-loaded from environment
crawler = AsyncWebCrawler()

Error Handling

from crawl4ai_cloud import (
    CloudError,
    AuthenticationError,
    RateLimitError,
    QuotaExceededError,
    NotFoundError,
)

try:
    result = await crawler.run(url)
except AuthenticationError:
    print("Invalid API key")
except RateLimitError as e:
    print(f"Rate limited. Retry after {e.retry_after}s")
except QuotaExceededError:
    print("Quota exceeded")

Links

License

Apache 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crawl4ai_cloud_sdk-0.2.0.tar.gz (26.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

crawl4ai_cloud_sdk-0.2.0-py3-none-any.whl (21.2 kB view details)

Uploaded Python 3

File details

Details for the file crawl4ai_cloud_sdk-0.2.0.tar.gz.

File metadata

  • Download URL: crawl4ai_cloud_sdk-0.2.0.tar.gz
  • Upload date:
  • Size: 26.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for crawl4ai_cloud_sdk-0.2.0.tar.gz
Algorithm Hash digest
SHA256 2c369bc35db48ec471c611814176ff82803de49f8f8d2492c2921704497c22c4
MD5 019cc821262d7e705ccfeea88f9f2fb6
BLAKE2b-256 547fa0e95d4e820d5f6f858b94327c3ba59afec834b927a4d64d1ad1577d3018

See more details on using hashes here.

File details

Details for the file crawl4ai_cloud_sdk-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for crawl4ai_cloud_sdk-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9aa994fa75c3f645c682ab50b3eb0f497ae7e3a8fd07527cd772d820a11c84ae
MD5 4cb3197275fa1b7d99fa5ace0c6301ce
BLAKE2b-256 110ce3c9c52d99d0d51504ae488d4f5876897aea859031cb49aca4a78391d171

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page