Skip to main content

Python SDK for CRW web scraper — scrape, crawl, and map any website from Python

Project description

crw

Python SDK for CRW — the open-source web scraper built for AI agents.

Install

# One-line install (auto-detects OS & arch):
curl -fsSL https://raw.githubusercontent.com/us/crw/main/install.sh | sh

# npm (zero install):
npx crw-mcp

# Python:
pip install crw

# Cargo:
cargo install crw-mcp

# Docker:
docker run -i ghcr.io/us/crw crw-mcp

CLI Usage

After installing, you can use crw-mcp as an MCP server for any AI coding agent:

# Start the MCP stdio server
crw-mcp

# Add to Claude Code
claude mcp add crw -- npx crw-mcp

MCP client config (works with Cursor, Windsurf, Cline, Claude Desktop, etc.):

{
  "mcpServers": {
    "crw": {
      "command": "npx",
      "args": ["crw-mcp"]
    }
  }
}

SDK Usage

CRW is cloud-first. By default the client uses the managed cloud (api.fastcrw.com) — sign up for 500 free credits (no payment, no monthly reset; GitHub/Google, ~10s) and set CRW_API_KEY. To self-host the engine locally instead, set CRW_LOCAL=1 (zero-config, no key).

from crw import CrwClient

# Cloud (default) — reads CRW_API_KEY from the environment:
client = CrwClient()
result = client.scrape("https://example.com")
print(result["markdown"])

# ...or pass the key explicitly:
client = CrwClient(api_key="fc-...")

# Self-hosted server:
client = CrwClient(api_url="http://localhost:3000")

# Local zero-config engine (no server, no key): run with CRW_LOCAL=1 in the env.

# Scrape with options:
result = client.scrape("https://example.com", formats=["markdown", "links"])
print(result["markdown"])
print(result["links"])

# Crawl a site:
job = client.crawl("https://example.com", max_depth=2, max_pages=10)
print(job["id"])

# Map all URLs on a site:
urls = client.map("https://example.com")
print(urls)

Search

Works in both modes. In subprocess mode the engine needs a SearXNG URL configured ([search].searxng_url or CRW_SEARCH__SEARXNG_URL); the managed cloud has one preconfigured.

from crw import CrwClient

client = CrwClient(api_key="YOUR_KEY")  # cloud (default)

# Basic search
results = client.search("web scraping tools 2026")

# Search with options
results = client.search(
    "AI news",
    limit=10,
    sources=["web", "news"],
    tbs="qdr:w",
)

# Search + scrape content
results = client.search(
    "python tutorials",
    scrape_options={"formats": ["markdown"]},
)

Note: If search isn't configured, the engine returns a clear search_disabled error.

Scrape options & structured (LLM) extraction

# Force the renderer, wait for JS, pin a renderer tier:
result = client.scrape("https://example.com", render_js=True, wait_for=1500, renderer="chrome")

# Structured extraction with a JSON Schema (adds the `json` format automatically).
# Requires an LLM provider configured on the engine.
result = client.scrape(
    "https://example.com",
    json_schema={"type": "object", "properties": {"title": {"type": "string"}}},
)
print(result["json"])

Parse a document (PDF → markdown / JSON)

Works in both modes.

# From a path:
doc = client.parse_file("invoice.pdf", formats=["markdown"])
print(doc["markdown"], doc["metadata"]["numPages"])

# From bytes, with structured extraction:
doc = client.parse_file(
    content=pdf_bytes,
    filename="invoice.pdf",
    json_schema={"type": "object", "properties": {"total": {"type": "number"}}},
)

Extract, batch, capabilities, change-tracking (HTTP mode)

These require api_url (a running server / cloud):

client = CrwClient(api_key="YOUR_KEY")  # cloud (default)

# Structured LLM extraction across URLs (async job, polled to completion):
data = client.extract(
    ["https://example.com"],
    schema={"type": "object", "properties": {"title": {"type": "string"}}},
)

# Scrape many URLs in one async batch:
pages = client.batch_scrape(["https://a.com", "https://b.com"], formats=["markdown"])

# Feature-detect the server:
caps = client.capabilities()

# Diff a page against a prior snapshot (stateless):
diff = client.change_tracking_diff(
    current={"markdown": "new content"},
    previous={"markdown": "old content"},
)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crw-0.15.0.tar.gz (22.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

crw-0.15.0-py3-none-any.whl (20.5 kB view details)

Uploaded Python 3

File details

Details for the file crw-0.15.0.tar.gz.

File metadata

  • Download URL: crw-0.15.0.tar.gz
  • Upload date:
  • Size: 22.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for crw-0.15.0.tar.gz
Algorithm Hash digest
SHA256 afe4201d94e3bb17a9d8f538edc5ced64913e621809cacd39fec6738ea8c2c38
MD5 538a6fa26793157dcae0529c0126740d
BLAKE2b-256 240ea9efe4b3376e53a8d9571b844f8263ca640d42ca1a8ab88c339c609c5987

See more details on using hashes here.

File details

Details for the file crw-0.15.0-py3-none-any.whl.

File metadata

  • Download URL: crw-0.15.0-py3-none-any.whl
  • Upload date:
  • Size: 20.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for crw-0.15.0-py3-none-any.whl
Algorithm Hash digest
SHA256 64562f153ed75b61a40628eb2c8b408d6195f074ac54d7538c6f11f54d14975b
MD5 62087302a48aa3c9efe2bd424686bf33
BLAKE2b-256 0aea82c24d0667d4d49a3ebed1da2757613b8adb79dbfb44843aa65171bcee65

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page