Python SDK for CRW web scraper — scrape, crawl, and map any website from Python
Project description
crw
Python SDK for CRW — the open-source web scraper built for AI agents.
Install
# One-line install (auto-detects OS & arch):
curl -fsSL https://raw.githubusercontent.com/us/crw/main/install.sh | sh
# npm (zero install):
npx crw-mcp
# Python:
pip install crw
# Cargo:
cargo install crw-mcp
# Docker:
docker run -i ghcr.io/us/crw crw-mcp
CLI Usage
After installing, you can use crw-mcp as an MCP server for any AI coding agent:
# Start the MCP stdio server
crw-mcp
# Add to Claude Code
claude mcp add crw -- npx crw-mcp
MCP client config (works with Cursor, Windsurf, Cline, Claude Desktop, etc.):
{
"mcpServers": {
"crw": {
"command": "npx",
"args": ["crw-mcp"]
}
}
}
SDK Usage
CRW is cloud-first. By default the client uses the managed cloud
(api.fastcrw.com) — sign up for 500 free credits
(no payment, no monthly reset; GitHub/Google, ~10s) and set CRW_API_KEY.
To self-host the engine locally instead, set CRW_LOCAL=1 (zero-config, no key).
from crw import CrwClient
# Cloud (default) — reads CRW_API_KEY from the environment:
client = CrwClient()
result = client.scrape("https://example.com")
print(result["markdown"])
# ...or pass the key explicitly:
client = CrwClient(api_key="fc-...")
# Self-hosted server:
client = CrwClient(api_url="http://localhost:3000")
# Local zero-config engine (no server, no key): run with CRW_LOCAL=1 in the env.
# Scrape with options:
result = client.scrape("https://example.com", formats=["markdown", "links"])
print(result["markdown"])
print(result["links"])
# Crawl a site:
job = client.crawl("https://example.com", max_depth=2, max_pages=10)
print(job["id"])
# Map all URLs on a site:
urls = client.map("https://example.com")
print(urls)
Search
Works in both modes. In subprocess mode the engine needs a SearXNG URL
configured ([search].searxng_url or CRW_SEARCH__SEARXNG_URL); the managed
cloud has one preconfigured.
from crw import CrwClient
client = CrwClient(api_key="YOUR_KEY") # cloud (default)
# Basic search
results = client.search("web scraping tools 2026")
# Search with options
results = client.search(
"AI news",
limit=10,
sources=["web", "news"],
tbs="qdr:w",
)
# Search + scrape content
results = client.search(
"python tutorials",
scrape_options={"formats": ["markdown"]},
)
Note: If search isn't configured, the engine returns a clear
search_disablederror.
Scrape options & structured (LLM) extraction
# Force the renderer, wait for JS, pin a renderer tier:
result = client.scrape("https://example.com", render_js=True, wait_for=1500, renderer="chrome")
# Structured extraction with a JSON Schema (adds the `json` format automatically).
# Requires an LLM provider configured on the engine.
result = client.scrape(
"https://example.com",
json_schema={"type": "object", "properties": {"title": {"type": "string"}}},
)
print(result["json"])
Parse a document (PDF → markdown / JSON)
Works in both modes.
# From a path:
doc = client.parse_file("invoice.pdf", formats=["markdown"])
print(doc["markdown"], doc["metadata"]["numPages"])
# From bytes, with structured extraction:
doc = client.parse_file(
content=pdf_bytes,
filename="invoice.pdf",
json_schema={"type": "object", "properties": {"total": {"type": "number"}}},
)
Extract, batch, capabilities, change-tracking (HTTP mode)
These require api_url (a running server / cloud):
client = CrwClient(api_key="YOUR_KEY") # cloud (default)
# Structured LLM extraction across URLs (async job, polled to completion):
data = client.extract(
["https://example.com"],
schema={"type": "object", "properties": {"title": {"type": "string"}}},
)
# Scrape many URLs in one async batch:
pages = client.batch_scrape(["https://a.com", "https://b.com"], formats=["markdown"])
# Feature-detect the server:
caps = client.capabilities()
# Diff a page against a prior snapshot (stateless):
diff = client.change_tracking_diff(
current={"markdown": "new content"},
previous={"markdown": "old content"},
)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file crw-0.15.1.tar.gz.
File metadata
- Download URL: crw-0.15.1.tar.gz
- Upload date:
- Size: 22.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
38eb428446ffcad623c70f40aaf59e4fd0f8744aa2ab599cf1584dfc6ad5c882
|
|
| MD5 |
308956b66191a07f2e3868b76c135240
|
|
| BLAKE2b-256 |
c30878c0d8f1a0679340ab4ebac6f7e097bd8968ae9d41fe8cb8fe3fe18c5709
|
File details
Details for the file crw-0.15.1-py3-none-any.whl.
File metadata
- Download URL: crw-0.15.1-py3-none-any.whl
- Upload date:
- Size: 20.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2547cef29cb486ddcd675276b2be8b9502db8a3a94b17d0ef9eed8e9dfc57300
|
|
| MD5 |
701bd6db15fb1d96e07506fd3f3472cd
|
|
| BLAKE2b-256 |
abccd828f1c605515be957c746f467d24bca586c6052585255e20df695b31c12
|