Skip to main content

Lightweight async client for Crawl4AI Docker server — no browser dependencies required

Project description

crawl4ai-client

Lightweight async Python client for Crawl4AI Docker server.

No browser dependencies required. Just httpx + pydantic (~2MB vs ~500MB for the full crawl4ai package).

Install

pip install crawl4ai-client

Quick Start

import asyncio
from crawl4ai_client import Crawl4aiDockerClient

async def main():
    async with Crawl4aiDockerClient(
        base_url="http://localhost:11235",
        api_token="your-token",  # optional
    ) as client:
        result = await client.crawl(["https://example.com"])
        print(result.raw_markdown)

asyncio.run(main())

Features

  • Crawl single or multiple URLs (/crawl)
  • Stream results as they complete (/crawl/stream)
  • Markdown extraction with filters (/md)
  • Screenshots as base64 PNG (/screenshot)
  • PDF generation (/pdf)
  • HTML preprocessing for schema extraction (/html)
  • JavaScript execution on pages (/execute_js)
  • LLM Q&A — ask questions about page content (/llm)
  • Per-URL configs for batch crawling (crawler_configs list)
  • Schema retrieval (/schema)
  • Async context manager with automatic cleanup

Usage

Basic crawl

from crawl4ai_client import Crawl4aiDockerClient, CrawlerRunConfig, CacheMode

async with Crawl4aiDockerClient(base_url="http://localhost:11235") as client:
    result = await client.crawl(
        ["https://example.com"],
        crawler_config=CrawlerRunConfig(cache_mode=CacheMode.BYPASS),
    )
    print(result.raw_markdown)

Multiple URLs with per-URL configs

from crawl4ai_client import Crawl4aiDockerClient, CrawlerRunConfig

async with Crawl4aiDockerClient(base_url="http://localhost:11235") as client:
    results = await client.crawl(
        ["https://example.com", "https://httpbin.org/html"],
        crawler_configs=[
            CrawlerRunConfig(word_count_threshold=5),
            CrawlerRunConfig(word_count_threshold=50),
        ],
    )
    for r in results:
        print(f"{r.url}: {len(r.raw_markdown)} chars")

Deep crawl

from crawl4ai_client import Crawl4aiDockerClient, CrawlerRunConfig, BFSDeepCrawlStrategy

async with Crawl4aiDockerClient(base_url="http://localhost:11235") as client:
    results = await client.crawl(
        ["https://example.com"],
        crawler_config=CrawlerRunConfig(
            deep_crawl_strategy=BFSDeepCrawlStrategy(max_depth=2, max_pages=10),
        ),
    )
    for r in results:
        print(f"{r.url}: {r.success}")

Also available: DFSDeepCrawlStrategy, BestFirstCrawlingStrategy.

Streaming

async with Crawl4aiDockerClient(base_url="http://localhost:11235") as client:
    async for result in client.crawl_stream(["https://example.com", "https://httpbin.org/html"]):
        print(f"Got: {result.url}")

Markdown endpoint

async with Crawl4aiDockerClient(base_url="http://localhost:11235") as client:
    md = await client.get_markdown("https://example.com", content_filter="fit")
    print(md)

Screenshot

async with Crawl4aiDockerClient(base_url="http://localhost:11235") as client:
    screenshot_b64 = await client.screenshot("https://example.com")

PDF generation

async with Crawl4aiDockerClient(base_url="http://localhost:11235") as client:
    pdf_b64 = await client.get_pdf("https://example.com")

HTML preprocessing

async with Crawl4aiDockerClient(base_url="http://localhost:11235") as client:
    html = await client.get_html("https://example.com")

JavaScript execution

async with Crawl4aiDockerClient(base_url="http://localhost:11235") as client:
    result = await client.execute_js(
        "https://example.com",
        scripts=["document.title", "document.querySelectorAll('a').length"],
    )
    print(result.js_execution_result)

LLM Q&A

async with Crawl4aiDockerClient(base_url="http://localhost:11235") as client:
    answer = await client.llm_query(
        "https://example.com",
        query="What is this page about?",
    )
    print(answer)

Why this package?

The full crawl4ai package installs 34+ dependencies (~500MB) including Playwright, browsers, numpy, and litellm. If you're running Crawl4AI as a Docker service and only need the client, this package gives you the same Crawl4aiDockerClient with just 2 dependencies.

Compatibility

This client is compatible with Crawl4AI Docker server v0.8.x+. The config classes (BrowserConfig, CrawlerRunConfig) produce the same serialized format as the full library.

License

Apache 2.0 — based on crawl4ai by unclecode.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crawl4ai_client-0.1.3.tar.gz (10.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

crawl4ai_client-0.1.3-py3-none-any.whl (10.1 kB view details)

Uploaded Python 3

File details

Details for the file crawl4ai_client-0.1.3.tar.gz.

File metadata

  • Download URL: crawl4ai_client-0.1.3.tar.gz
  • Upload date:
  • Size: 10.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.8

File hashes

Hashes for crawl4ai_client-0.1.3.tar.gz
Algorithm Hash digest
SHA256 4ff0586877ee46b39129d84b0f064644045e1e579c600c6d5c2e20b1c8bd3a0a
MD5 bc17dd02ec8517dc9de62b97c493ad16
BLAKE2b-256 fb494c0a6e2d4b02b2ec9903cd0f8bf2b88030bf34a9d521c62426153a030ad3

See more details on using hashes here.

File details

Details for the file crawl4ai_client-0.1.3-py3-none-any.whl.

File metadata

File hashes

Hashes for crawl4ai_client-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 025275b374f3a9405c8a39ec9b7b63a86e4f320fdfe419c89151e662ac0b508b
MD5 b6abcd2f058cab8dd53e51c0dcca9b39
BLAKE2b-256 3bd01daf8a03ffdd10e4a51d787978531cae65d6f5cbcae62b751695d8c6d268

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page