Skip to main content

Python SDK for CRW web scraper — scrape, crawl, and map any website from Python

Project description

crw

Python SDK for CRW — the open-source web scraper built for AI agents.

Install

# One-line install (auto-detects OS & arch):
curl -fsSL https://raw.githubusercontent.com/us/crw/main/install.sh | sh

# npm (zero install):
npx crw-mcp

# Python:
pip install crw

# Cargo:
cargo install crw-mcp

# Docker:
docker run -i ghcr.io/us/crw crw-mcp

CLI Usage

After installing, you can use crw-mcp as an MCP server for any AI coding agent:

# Start the MCP stdio server
crw-mcp

# Add to Claude Code
claude mcp add crw -- npx crw-mcp

MCP client config (works with Cursor, Windsurf, Cline, Claude Desktop, etc.):

{
  "mcpServers": {
    "crw": {
      "command": "npx",
      "args": ["crw-mcp"]
    }
  }
}

SDK Usage

CRW is cloud-first. By default the client uses the managed cloud (api.fastcrw.com) — sign up for 500 free credits (no payment, no monthly reset; GitHub/Google, ~10s) and set CRW_API_KEY. To self-host the engine locally instead, set CRW_LOCAL=1 (zero-config, no key).

from crw import CrwClient

# Cloud (default) — reads CRW_API_KEY from the environment:
client = CrwClient()
result = client.scrape("https://example.com")
print(result["markdown"])

# ...or pass the key explicitly:
client = CrwClient(api_key="fc-...")

# Self-hosted server:
client = CrwClient(api_url="http://localhost:3000")

# Local zero-config engine (no server, no key): run with CRW_LOCAL=1 in the env.

# Scrape with options:
result = client.scrape("https://example.com", formats=["markdown", "links"])
print(result["markdown"])
print(result["links"])

# Crawl a site:
job = client.crawl("https://example.com", max_depth=2, max_pages=10)
print(job["id"])

# Map all URLs on a site:
urls = client.map("https://example.com")
print(urls)

Search

Works in both modes. In subprocess mode the engine needs a SearXNG URL configured ([search].searxng_url or CRW_SEARCH__SEARXNG_URL); the managed cloud has one preconfigured.

from crw import CrwClient

client = CrwClient(api_key="YOUR_KEY")  # cloud (default)

# Basic search
results = client.search("web scraping tools 2026")

# Search with options
results = client.search(
    "AI news",
    limit=10,
    sources=["web", "news"],
    tbs="qdr:w",
)

# Search + scrape content
results = client.search(
    "python tutorials",
    scrape_options={"formats": ["markdown"]},
)

Note: If search isn't configured, the engine returns a clear search_disabled error.

Scrape options & structured (LLM) extraction

# Force the renderer, wait for JS, pin a renderer tier:
result = client.scrape("https://example.com", render_js=True, wait_for=1500, renderer="chrome")

# Structured extraction with a JSON Schema (adds the `json` format automatically).
# Requires an LLM provider configured on the engine.
result = client.scrape(
    "https://example.com",
    json_schema={"type": "object", "properties": {"title": {"type": "string"}}},
)
print(result["json"])

Parse a document (PDF → markdown / JSON)

Works in both modes.

# From a path:
doc = client.parse_file("invoice.pdf", formats=["markdown"])
print(doc["markdown"], doc["metadata"]["numPages"])

# From bytes, with structured extraction:
doc = client.parse_file(
    content=pdf_bytes,
    filename="invoice.pdf",
    json_schema={"type": "object", "properties": {"total": {"type": "number"}}},
)

Extract, batch, capabilities, change-tracking (HTTP mode)

These require api_url (a running server / cloud):

client = CrwClient(api_key="YOUR_KEY")  # cloud (default)

# Structured LLM extraction across URLs (async job, polled to completion):
data = client.extract(
    ["https://example.com"],
    schema={"type": "object", "properties": {"title": {"type": "string"}}},
)

# Scrape many URLs in one async batch:
pages = client.batch_scrape(["https://a.com", "https://b.com"], formats=["markdown"])

# Feature-detect the server:
caps = client.capabilities()

# Diff a page against a prior snapshot (stateless):
diff = client.change_tracking_diff(
    current={"markdown": "new content"},
    previous={"markdown": "old content"},
)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crw-0.15.2.tar.gz (22.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

crw-0.15.2-py3-none-any.whl (20.5 kB view details)

Uploaded Python 3

File details

Details for the file crw-0.15.2.tar.gz.

File metadata

  • Download URL: crw-0.15.2.tar.gz
  • Upload date:
  • Size: 22.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for crw-0.15.2.tar.gz
Algorithm Hash digest
SHA256 dfcb99e8ade6c88770e4968a45774f2be6b7b3d203ebbe5364ad0b72306138ae
MD5 ba322f3c8ab5054cdab6407011754c07
BLAKE2b-256 dea488305630781f1f712a3a95e13c040fe328db170e10d8fe243e92ac0ded02

See more details on using hashes here.

File details

Details for the file crw-0.15.2-py3-none-any.whl.

File metadata

  • Download URL: crw-0.15.2-py3-none-any.whl
  • Upload date:
  • Size: 20.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for crw-0.15.2-py3-none-any.whl
Algorithm Hash digest
SHA256 b962deb7cca6d0bf1f7f002e8ecdb63200fe324e961c2989c252ad333efbf346
MD5 d8c081c37e27dee2b50f961ee6e8d891
BLAKE2b-256 52cece7422c2472768f93a220c43b3bdefed03bf26d20e2cc2b692063e63fd23

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page