CRW web scraping tools for CrewAI — scrape, crawl, and map websites
Project description
crewai-crw
CRW web scraping tools for CrewAI — scrape, crawl, map, and search the web with AI agents.
CRW is an open-source web scraper built for AI agents. Single Rust binary, ~6 MB idle RAM, Firecrawl-compatible API.
Installation
pip install crewai crewai-crw
# or
uv add crewai crewai-crw
That's it. No server to install, no cargo install, no Docker. The crw SDK automatically downloads and manages the CRW binary for you.
Quick Start — Zero Config (Subprocess Mode)
from crewai_crw import CrwScrapeWebsiteTool
# Just works — crw SDK handles everything locally
scrape_tool = CrwScrapeWebsiteTool()
Cloud Mode (fastcrw.com)
No local binary needed. Sign up at fastcrw.com and get 500 free credits:
from crewai_crw import CrwScrapeWebsiteTool
scrape_tool = CrwScrapeWebsiteTool(
api_url="https://fastcrw.com/api",
api_key="crw_live_...", # or set CRW_API_KEY env var
)
Advanced: Self-hosted Server
If you prefer running a persistent CRW server (e.g., shared across services):
# Option A: Install binary
curl -fsSL https://raw.githubusercontent.com/us/crw/main/install.sh | bash
crw # starts on http://localhost:3000
# Option B: Docker
docker run -d -p 3000:3000 ghcr.io/us/crw:latest
scrape_tool = CrwScrapeWebsiteTool(api_url="http://localhost:3000")
Tools
| Tool | Description |
|---|---|
CrwScrapeWebsiteTool |
Scrape a single URL and get clean markdown |
CrwCrawlWebsiteTool |
BFS crawl a website, collect content from multiple pages |
CrwMapWebsiteTool |
Discover all URLs on a website |
CrwSearchWebTool |
Search the web and get results (cloud only) |
CrewAI Example
from crewai import Agent, Task, Crew
from crewai_crw import CrwScrapeWebsiteTool
# Zero config — just works out of the box
scrape_tool = CrwScrapeWebsiteTool()
researcher = Agent(
role="Web Researcher",
goal="Research and summarize information from websites",
backstory="Expert at extracting key information from web pages",
tools=[scrape_tool],
)
task = Task(
description="Scrape https://example.com and summarize the content",
expected_output="A summary of the page content",
agent=researcher,
)
crew = Crew(agents=[researcher], tasks=[task])
result = crew.kickoff()
Crawl an entire site
from crewai_crw import CrwCrawlWebsiteTool
crawl_tool = CrwCrawlWebsiteTool(
config={
"maxDepth": 3,
"maxPages": 50,
"formats": ["markdown"],
"onlyMainContent": True,
}
)
# Use in an agent
researcher = Agent(
role="Deep Researcher",
goal="Crawl documentation sites and extract comprehensive information",
backstory="Expert at gathering information across multiple pages",
tools=[crawl_tool],
)
Discover all URLs on a site
from crewai_crw import CrwMapWebsiteTool
map_tool = CrwMapWebsiteTool()
# Use in an agent
mapper = Agent(
role="Site Mapper",
goal="Discover and catalog all pages on a website",
backstory="Expert at understanding website structure",
tools=[map_tool],
)
Search the web (Cloud Only)
Note: Web search is a cloud-only feature. It requires
api_urlpointing to a CRW cloud instance (e.g. fastcrw.com). Subprocess mode is not supported for search.
from crewai_crw import CrwSearchWebTool
search_tool = CrwSearchWebTool(
api_url="https://fastcrw.com/api",
api_key="YOUR_KEY",
)
# Use in an agent
researcher = Agent(
role="Web Researcher",
goal="Find the latest information on any topic",
backstory="Expert at searching the web for relevant information",
tools=[search_tool],
)
Configuration
Constructor Arguments
| Argument | Type | Default | Description |
|---|---|---|---|
api_url |
str | None |
None |
CRW server URL. If unset, uses subprocess mode (no server needed) |
api_key |
str | None |
None |
API key (required for fastcrw.com) |
config |
dict |
varies per tool | Tool-specific configuration |
Environment Variables
Both CRW_API_URL and CRW_API_KEY can be set via environment variables as fallbacks:
export CRW_API_URL=https://fastcrw.com/api # or http://localhost:3000
export CRW_API_KEY=your_api_key # required for cloud, optional for self-hosted
# With env vars set, no constructor args needed:
tool = CrwScrapeWebsiteTool()
Scrape Config
| Key | Type | Default | Description |
|---|---|---|---|
formats |
list[str] |
["markdown"] |
Output formats: markdown, html, rawHtml, plainText, links, json |
onlyMainContent |
bool |
true |
Strip nav/footer/sidebar |
renderJs |
bool|null |
null |
null=auto, true=force JS, false=HTTP only |
waitFor |
int |
— | ms to wait after JS rendering |
includeTags |
list[str] |
[] |
CSS selectors to include |
excludeTags |
list[str] |
[] |
CSS selectors to exclude |
Crawl Config
| Key | Type | Default | Description |
|---|---|---|---|
maxDepth |
int |
2 |
Maximum link-follow depth |
maxPages |
int |
10 |
Maximum pages to scrape |
formats |
list[str] |
["markdown"] |
Output formats per page |
onlyMainContent |
bool |
true |
Strip boilerplate |
Map Config
| Key | Type | Default | Description |
|---|---|---|---|
maxDepth |
int |
2 |
Maximum discovery depth |
useSitemap |
bool |
true |
Also read sitemap.xml |
Search Config (Cloud Only)
| Key | Type | Default | Description |
|---|---|---|---|
limit |
int |
5 |
Maximum number of results to return |
Compared to Firecrawl Tools
| Feature | crewai-crw | Firecrawl Tools |
|---|---|---|
| Requires SDK package | No (uses requests) |
Yes (firecrawl-py) |
| Requires API key | No (subprocess or self-hosted) | Yes (always) |
| Server required | No (pip install is all you need) |
Yes (always) |
| Self-hosted option | Yes (single binary, auto-managed) | Complex (5+ containers) |
| Cloud option | Yes (fastcrw.com) | Yes (firecrawl.dev) |
| Idle RAM | ~6 MB | ~500 MB+ |
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file crewai_crw-0.4.0.tar.gz.
File metadata
- Download URL: crewai_crw-0.4.0.tar.gz
- Upload date:
- Size: 238.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.3 {"installer":{"name":"uv","version":"0.11.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a9d634a4d67cecfd164109e69b220188af49b5ac59341ab98ebe453cfc726b97
|
|
| MD5 |
2b7a35a71730f8b1c55faf559218664d
|
|
| BLAKE2b-256 |
47037e8360a1eb565b1451a578c1bbfe717385122369370221a1a2dbd3055eba
|
File details
Details for the file crewai_crw-0.4.0-py3-none-any.whl.
File metadata
- Download URL: crewai_crw-0.4.0-py3-none-any.whl
- Upload date:
- Size: 7.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.3 {"installer":{"name":"uv","version":"0.11.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d2f0e7c3fb4ca13ad6b1cdbdb4dca51141590e4c47e1c72b591e84fa439660e7
|
|
| MD5 |
76c24f1cc293cb36ee4f913599a2c0b0
|
|
| BLAKE2b-256 |
cb033d6055e3739e9b145cc7009d993acefdc8f61d2a5aa7ba6628d5b9af9bd8
|