Python SDK for CRW web scraper — scrape, crawl, and map any website from Python

These details have not been verified by PyPI

Project links

Project description

crw

Python SDK for CRW — the open-source web scraper built for AI agents.

Install

# One-line install (auto-detects OS & arch):
curl -fsSL https://raw.githubusercontent.com/us/crw/main/install.sh | sh

# npm (zero install):
npx crw-mcp

# Python:
pip install crw

# Cargo:
cargo install crw-mcp

# Docker:
docker run -i ghcr.io/us/crw crw-mcp

CLI Usage

After installing, you can use crw-mcp as an MCP server for any AI coding agent:

# Start the MCP stdio server
crw-mcp

# Add to Claude Code
claude mcp add crw -- npx crw-mcp

MCP client config (works with Cursor, Windsurf, Cline, Claude Desktop, etc.):

{
  "mcpServers": {
    "crw": {
      "command": "npx",
      "args": ["crw-mcp"]
    }
  }
}

SDK Usage

CRW is cloud-first. By default the client uses the managed cloud (api.fastcrw.com) — sign up for 500 free credits (no payment, no monthly reset; GitHub/Google, ~10s) and set CRW_API_KEY. To self-host the engine locally instead, set CRW_LOCAL=1 (zero-config, no key).

from crw import CrwClient

# Cloud (default) — reads CRW_API_KEY from the environment:
client = CrwClient()
result = client.scrape("https://example.com")
print(result["markdown"])

# ...or pass the key explicitly:
client = CrwClient(api_key="fc-...")

# Self-hosted server:
client = CrwClient(api_url="http://localhost:3000")

# Local zero-config engine (no server, no key): run with CRW_LOCAL=1 in the env.

# Scrape with options:
result = client.scrape("https://example.com", formats=["markdown", "links"])
print(result["markdown"])
print(result["links"])

# Crawl a site:
job = client.crawl("https://example.com", max_depth=2, max_pages=10)
print(job["id"])

# Map all URLs on a site:
urls = client.map("https://example.com")
print(urls)

Search

Works in both modes. In subprocess mode the engine needs a SearXNG URL configured ([search].searxng_url or CRW_SEARCH__SEARXNG_URL); the managed cloud has one preconfigured.

from crw import CrwClient

client = CrwClient(api_key="YOUR_KEY")  # cloud (default)

# Basic search
results = client.search("web scraping tools 2026")

# Search with options
results = client.search(
    "AI news",
    limit=10,
    sources=["web", "news"],
    tbs="qdr:w",
)

# Search + scrape content
results = client.search(
    "python tutorials",
    scrape_options={"formats": ["markdown"]},
)

Note: If search isn't configured, the engine returns a clear search_disabled error.

Scrape options & structured (LLM) extraction

# Force the renderer, wait for JS, pin a renderer tier:
result = client.scrape("https://example.com", render_js=True, wait_for=1500, renderer="chrome")

# Structured extraction with a JSON Schema (adds the `json` format automatically).
# Requires an LLM provider configured on the engine.
result = client.scrape(
    "https://example.com",
    json_schema={"type": "object", "properties": {"title": {"type": "string"}}},
)
print(result["json"])

Parse a document (PDF → markdown / JSON)

Works in both modes.

# From a path:
doc = client.parse_file("invoice.pdf", formats=["markdown"])
print(doc["markdown"], doc["metadata"]["numPages"])

# From bytes, with structured extraction:
doc = client.parse_file(
    content=pdf_bytes,
    filename="invoice.pdf",
    json_schema={"type": "object", "properties": {"total": {"type": "number"}}},
)

Extract, batch, capabilities, change-tracking (HTTP mode)

These require api_url (a running server / cloud):

client = CrwClient(api_key="YOUR_KEY")  # cloud (default)

# Structured LLM extraction across URLs (async job, polled to completion):
data = client.extract(
    ["https://example.com"],
    schema={"type": "object", "properties": {"title": {"type": "string"}}},
)

# Scrape many URLs in one async batch:
pages = client.batch_scrape(["https://a.com", "https://b.com"], formats=["markdown"])

# Feature-detect the server:
caps = client.capabilities()

# Diff a page against a prior snapshot (stateless):
diff = client.change_tracking_diff(
    current={"markdown": "new content"},
    previous={"markdown": "old content"},
)

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.16.0

Jun 14, 2026

0.15.2

Jun 12, 2026

0.15.1

Jun 11, 2026

0.15.0

Jun 10, 2026

0.14.0

Jun 8, 2026

0.13.4

Jun 7, 2026

0.13.3

Jun 7, 2026

0.13.2

Jun 6, 2026

0.13.1

Jun 6, 2026

0.13.0

Jun 6, 2026

0.12.1

Jun 5, 2026

0.11.0

Jun 3, 2026

0.10.0

May 20, 2026

0.9.1

May 16, 2026

0.8.3

May 15, 2026

0.8.2

May 14, 2026

0.7.1

May 12, 2026

0.6.4

May 12, 2026

0.6.3

May 12, 2026

0.6.2

May 10, 2026

0.6.1

May 9, 2026

0.6.0

May 9, 2026

0.5.0

May 4, 2026

0.4.2

Apr 29, 2026

0.4.1

Apr 29, 2026

0.4.0

Apr 23, 2026

0.3.6

Apr 21, 2026

0.3.5

Apr 12, 2026

0.3.2

Apr 8, 2026

0.3.1

Apr 8, 2026

0.3.0

Apr 4, 2026

0.2.2

Apr 2, 2026

0.2.1

Mar 29, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crw-0.16.0.tar.gz (22.9 kB view details)

Uploaded Jun 14, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

crw-0.16.0-py3-none-any.whl (20.5 kB view details)

Uploaded Jun 14, 2026 Python 3

File details

Details for the file crw-0.16.0.tar.gz.

File metadata

Download URL: crw-0.16.0.tar.gz
Upload date: Jun 14, 2026
Size: 22.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for crw-0.16.0.tar.gz
Algorithm	Hash digest
SHA256	`dd0b3905bee7de73e6a9c2a81effc001b939dc30b8c87ab76925f95892694e98`
MD5	`6aa131f24d1327b2e1b59d9101701dd9`
BLAKE2b-256	`73ac13b0e29e32967d62f08467b5a5d3b0480ea4b0d8667b2c9b04a695c47043`

See more details on using hashes here.

File details

Details for the file crw-0.16.0-py3-none-any.whl.

File metadata

Download URL: crw-0.16.0-py3-none-any.whl
Upload date: Jun 14, 2026
Size: 20.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for crw-0.16.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`335d2a90bf2dc0c5bf0e0391217148c49e8ef329d2c716411f76a066b5a83f97`
MD5	`cff5bc2c9116c8ffe7b6afb948c10614`
BLAKE2b-256	`f285291934465b263097d3fba084221f719d1d4df65044b6b7582666cc026dc1`

See more details on using hashes here.

crw 0.16.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

crw

Install

CLI Usage

SDK Usage

Search

Scrape options & structured (LLM) extraction

Parse a document (PDF → markdown / JSON)

Extract, batch, capabilities, change-tracking (HTTP mode)

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes