Skip to main content

Python SDK for CRW web scraper — scrape, crawl, and map any website from Python

Project description

crw

Python SDK for CRW — the open-source web scraper built for AI agents.

Install

# One-line install (auto-detects OS & arch):
curl -fsSL https://raw.githubusercontent.com/us/crw/main/install.sh | sh

# npm (zero install):
npx crw-mcp

# Python:
pip install crw

# Cargo:
cargo install crw-mcp

# Docker:
docker run -i ghcr.io/us/crw crw-mcp

CLI Usage

After installing, you can use crw-mcp as an MCP server for any AI coding agent:

# Start the MCP stdio server
crw-mcp

# Add to Claude Code
claude mcp add crw -- npx crw-mcp

MCP client config (works with Cursor, Windsurf, Cline, Claude Desktop, etc.):

{
  "mcpServers": {
    "crw": {
      "command": "npx",
      "args": ["crw-mcp"]
    }
  }
}

SDK Usage

CRW is cloud-first. By default the client uses the managed cloud (api.fastcrw.com) — sign up for 500 free credits (no payment, no monthly reset; GitHub/Google, ~10s) and set CRW_API_KEY. To self-host the engine locally instead, set CRW_LOCAL=1 (zero-config, no key).

from crw import CrwClient

# Cloud (default) — reads CRW_API_KEY from the environment:
client = CrwClient()
result = client.scrape("https://example.com")
print(result["markdown"])

# ...or pass the key explicitly:
client = CrwClient(api_key="fc-...")

# Self-hosted server:
client = CrwClient(api_url="http://localhost:3000")

# Local zero-config engine (no server, no key): run with CRW_LOCAL=1 in the env.

# Scrape with options:
result = client.scrape("https://example.com", formats=["markdown", "links"])
print(result["markdown"])
print(result["links"])

# Crawl a site:
job = client.crawl("https://example.com", max_depth=2, max_pages=10)
print(job["id"])

# Map all URLs on a site:
urls = client.map("https://example.com")
print(urls)

Search

Works in both modes. In subprocess mode the engine needs a SearXNG URL configured ([search].searxng_url or CRW_SEARCH__SEARXNG_URL); the managed cloud has one preconfigured.

from crw import CrwClient

client = CrwClient(api_key="YOUR_KEY")  # cloud (default)

# Basic search
results = client.search("web scraping tools 2026")

# Search with options
results = client.search(
    "AI news",
    limit=10,
    sources=["web", "news"],
    tbs="qdr:w",
)

# Search + scrape content
results = client.search(
    "python tutorials",
    scrape_options={"formats": ["markdown"]},
)

Note: If search isn't configured, the engine returns a clear search_disabled error.

Scrape options & structured (LLM) extraction

# Force the renderer, wait for JS, pin a renderer tier:
result = client.scrape("https://example.com", render_js=True, wait_for=1500, renderer="chrome")

# Structured extraction with a JSON Schema (adds the `json` format automatically).
# Requires an LLM provider configured on the engine.
result = client.scrape(
    "https://example.com",
    json_schema={"type": "object", "properties": {"title": {"type": "string"}}},
)
print(result["json"])

Parse a document (PDF → markdown / JSON)

Works in both modes.

# From a path:
doc = client.parse_file("invoice.pdf", formats=["markdown"])
print(doc["markdown"], doc["metadata"]["numPages"])

# From bytes, with structured extraction:
doc = client.parse_file(
    content=pdf_bytes,
    filename="invoice.pdf",
    json_schema={"type": "object", "properties": {"total": {"type": "number"}}},
)

Extract, batch, capabilities, change-tracking (HTTP mode)

These require api_url (a running server / cloud):

client = CrwClient(api_key="YOUR_KEY")  # cloud (default)

# Structured LLM extraction across URLs (async job, polled to completion):
data = client.extract(
    ["https://example.com"],
    schema={"type": "object", "properties": {"title": {"type": "string"}}},
)

# Scrape many URLs in one async batch:
pages = client.batch_scrape(["https://a.com", "https://b.com"], formats=["markdown"])

# Feature-detect the server:
caps = client.capabilities()

# Diff a page against a prior snapshot (stateless):
diff = client.change_tracking_diff(
    current={"markdown": "new content"},
    previous={"markdown": "old content"},
)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crw-0.16.0.tar.gz (22.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

crw-0.16.0-py3-none-any.whl (20.5 kB view details)

Uploaded Python 3

File details

Details for the file crw-0.16.0.tar.gz.

File metadata

  • Download URL: crw-0.16.0.tar.gz
  • Upload date:
  • Size: 22.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for crw-0.16.0.tar.gz
Algorithm Hash digest
SHA256 dd0b3905bee7de73e6a9c2a81effc001b939dc30b8c87ab76925f95892694e98
MD5 6aa131f24d1327b2e1b59d9101701dd9
BLAKE2b-256 73ac13b0e29e32967d62f08467b5a5d3b0480ea4b0d8667b2c9b04a695c47043

See more details on using hashes here.

File details

Details for the file crw-0.16.0-py3-none-any.whl.

File metadata

  • Download URL: crw-0.16.0-py3-none-any.whl
  • Upload date:
  • Size: 20.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for crw-0.16.0-py3-none-any.whl
Algorithm Hash digest
SHA256 335d2a90bf2dc0c5bf0e0391217148c49e8ef329d2c716411f76a066b5a83f97
MD5 cff5bc2c9116c8ffe7b6afb948c10614
BLAKE2b-256 f285291934465b263097d3fba084221f719d1d4df65044b6b7582666cc026dc1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page