Skip to main content

Python SDK for CRW web scraper — scrape, crawl, and map any website from Python

Project description

crw

Python SDK for CRW — the open-source web scraper built for AI agents.

Install

# One-line install (auto-detects OS & arch):
curl -fsSL https://raw.githubusercontent.com/us/crw/main/install.sh | sh

# npm (zero install):
npx crw-mcp

# Python:
pip install crw

# Cargo:
cargo install crw-mcp

# Docker:
docker run -i ghcr.io/us/crw crw-mcp

CLI Usage

After installing, you can use crw-mcp as an MCP server for any AI coding agent:

# Start the MCP stdio server
crw-mcp

# Add to Claude Code
claude mcp add crw -- npx crw-mcp

MCP client config (works with Cursor, Windsurf, Cline, Claude Desktop, etc.):

{
  "mcpServers": {
    "crw": {
      "command": "npx",
      "args": ["crw-mcp"]
    }
  }
}

SDK Usage

CRW is cloud-first. By default the client uses the managed cloud (api.fastcrw.com) — sign up for 500 free credits (no payment, no monthly reset; GitHub/Google, ~10s) and set CRW_API_KEY. To self-host the engine locally instead, set CRW_LOCAL=1 (zero-config, no key).

from crw import CrwClient

# Cloud (default) — reads CRW_API_KEY from the environment:
client = CrwClient()
result = client.scrape("https://example.com")
print(result["markdown"])

# ...or pass the key explicitly:
client = CrwClient(api_key="fc-...")

# Self-hosted server:
client = CrwClient(api_url="http://localhost:3000")

# Local zero-config engine (no server, no key): run with CRW_LOCAL=1 in the env.

# Scrape with options:
result = client.scrape("https://example.com", formats=["markdown", "links"])
print(result["markdown"])
print(result["links"])

# Crawl a site:
job = client.crawl("https://example.com", max_depth=2, max_pages=10)
print(job["id"])

# Map all URLs on a site:
urls = client.map("https://example.com")
print(urls)

Search

Works in both modes. In subprocess mode the engine needs a SearXNG URL configured ([search].searxng_url or CRW_SEARCH__SEARXNG_URL); the managed cloud has one preconfigured.

from crw import CrwClient

client = CrwClient(api_key="YOUR_KEY")  # cloud (default)

# Basic search
results = client.search("web scraping tools 2026")

# Search with options
results = client.search(
    "AI news",
    limit=10,
    sources=["web", "news"],
    tbs="qdr:w",
)

# Search + scrape content
results = client.search(
    "python tutorials",
    scrape_options={"formats": ["markdown"]},
)

Note: If search isn't configured, the engine returns a clear search_disabled error.

Scrape options & structured (LLM) extraction

# Force the renderer, wait for JS, pin a renderer tier:
result = client.scrape("https://example.com", render_js=True, wait_for=1500, renderer="chrome")

# Structured extraction with a JSON Schema (adds the `json` format automatically).
# Requires an LLM provider configured on the engine.
result = client.scrape(
    "https://example.com",
    json_schema={"type": "object", "properties": {"title": {"type": "string"}}},
)
print(result["json"])

Parse a document (PDF → markdown / JSON)

Works in both modes.

# From a path:
doc = client.parse_file("invoice.pdf", formats=["markdown"])
print(doc["markdown"], doc["metadata"]["numPages"])

# From bytes, with structured extraction:
doc = client.parse_file(
    content=pdf_bytes,
    filename="invoice.pdf",
    json_schema={"type": "object", "properties": {"total": {"type": "number"}}},
)

Extract, batch, capabilities, change-tracking (HTTP mode)

These require api_url (a running server / cloud):

client = CrwClient(api_key="YOUR_KEY")  # cloud (default)

# Structured LLM extraction across URLs (async job, polled to completion):
data = client.extract(
    ["https://example.com"],
    schema={"type": "object", "properties": {"title": {"type": "string"}}},
)

# Scrape many URLs in one async batch:
pages = client.batch_scrape(["https://a.com", "https://b.com"], formats=["markdown"])

# Feature-detect the server:
caps = client.capabilities()

# Diff a page against a prior snapshot (stateless):
diff = client.change_tracking_diff(
    current={"markdown": "new content"},
    previous={"markdown": "old content"},
)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crw-0.15.1.tar.gz (22.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

crw-0.15.1-py3-none-any.whl (20.5 kB view details)

Uploaded Python 3

File details

Details for the file crw-0.15.1.tar.gz.

File metadata

  • Download URL: crw-0.15.1.tar.gz
  • Upload date:
  • Size: 22.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for crw-0.15.1.tar.gz
Algorithm Hash digest
SHA256 38eb428446ffcad623c70f40aaf59e4fd0f8744aa2ab599cf1584dfc6ad5c882
MD5 308956b66191a07f2e3868b76c135240
BLAKE2b-256 c30878c0d8f1a0679340ab4ebac6f7e097bd8968ae9d41fe8cb8fe3fe18c5709

See more details on using hashes here.

File details

Details for the file crw-0.15.1-py3-none-any.whl.

File metadata

  • Download URL: crw-0.15.1-py3-none-any.whl
  • Upload date:
  • Size: 20.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for crw-0.15.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2547cef29cb486ddcd675276b2be8b9502db8a3a94b17d0ef9eed8e9dfc57300
MD5 701bd6db15fb1d96e07506fd3f3472cd
BLAKE2b-256 abccd828f1c605515be957c746f467d24bca586c6052585255e20df695b31c12

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page