CRW web scraping tools for CrewAI — scrape, crawl, and map websites
Project description
crewai-crw
CRW web scraping tools for CrewAI — scrape, crawl, and map websites with AI agents.
CRW is an open-source web scraper built for AI agents. Single Rust binary, ~6 MB idle RAM, Firecrawl-compatible API.
Installation
pip install crewai-crw
You also need a CRW backend — either self-hosted or cloud:
Option A: Self-hosted (free)
curl -fsSL https://raw.githubusercontent.com/us/crw/main/install.sh | bash
crw # starts on http://localhost:3000
Option B: Cloud (fastcrw.com)
export CRW_API_URL=https://fastcrw.com/api
export CRW_API_KEY=your_api_key
Tools
| Tool | Description |
|---|---|
CrwScrapeWebsiteTool |
Scrape a single URL and get clean markdown |
CrwCrawlWebsiteTool |
BFS crawl a website, collect content from multiple pages |
CrwMapWebsiteTool |
Discover all URLs on a website |
Quick Start
from crewai import Agent, Task, Crew
from crewai_crw import CrwScrapeWebsiteTool
# Self-hosted (default: localhost:3000)
scrape_tool = CrwScrapeWebsiteTool()
# Or use the cloud
scrape_tool = CrwScrapeWebsiteTool(
api_url="https://fastcrw.com/api",
api_key="YOUR_KEY",
)
researcher = Agent(
role="Web Researcher",
goal="Research and summarize information from websites",
backstory="Expert at extracting key information from web pages",
tools=[scrape_tool],
)
task = Task(
description="Scrape https://example.com and summarize the content",
expected_output="A summary of the page content",
agent=researcher,
)
crew = Crew(agents=[researcher], tasks=[task])
result = crew.kickoff()
Crawl an entire site
from crewai_crw import CrwCrawlWebsiteTool
crawl_tool = CrwCrawlWebsiteTool(
config={
"maxDepth": 3,
"maxPages": 50,
"formats": ["markdown"],
"onlyMainContent": True,
}
)
Discover all URLs on a site
from crewai_crw import CrwMapWebsiteTool
map_tool = CrwMapWebsiteTool()
Configuration
Constructor Arguments
| Argument | Type | Default | Description |
|---|---|---|---|
api_url |
str |
http://localhost:3000 |
CRW server URL |
api_key |
str | None |
None |
API key (required for fastcrw.com) |
config |
dict |
varies per tool | Tool-specific configuration |
Environment Variables
Both CRW_API_URL and CRW_API_KEY can be set via environment variables as fallbacks:
export CRW_API_URL=https://fastcrw.com/api # or http://localhost:3000
export CRW_API_KEY=your_api_key # required for cloud, optional for self-hosted
# With env vars set, no constructor args needed:
tool = CrwScrapeWebsiteTool()
Scrape Config
| Key | Type | Default | Description |
|---|---|---|---|
formats |
list[str] |
["markdown"] |
Output formats: markdown, html, rawHtml, plainText, links, json |
onlyMainContent |
bool |
true |
Strip nav/footer/sidebar |
renderJs |
bool|null |
null |
null=auto, true=force JS, false=HTTP only |
waitFor |
int |
— | ms to wait after JS rendering |
includeTags |
list[str] |
[] |
CSS selectors to include |
excludeTags |
list[str] |
[] |
CSS selectors to exclude |
Crawl Config
| Key | Type | Default | Description |
|---|---|---|---|
maxDepth |
int |
2 |
Maximum link-follow depth |
maxPages |
int |
10 |
Maximum pages to scrape |
formats |
list[str] |
["markdown"] |
Output formats per page |
onlyMainContent |
bool |
true |
Strip boilerplate |
Map Config
| Key | Type | Default | Description |
|---|---|---|---|
maxDepth |
int |
2 |
Maximum discovery depth |
useSitemap |
bool |
true |
Also read sitemap.xml |
Compared to Firecrawl Tools
| Feature | crewai-crw | Firecrawl Tools |
|---|---|---|
| Requires SDK package | No (uses requests) |
Yes (firecrawl-py) |
| Requires API key | No (self-hosted) | Yes (always) |
| Self-hosted option | Yes (single binary) | Complex (5+ containers) |
| Cloud option | Yes (fastcrw.com) | Yes (firecrawl.dev) |
| Idle RAM | ~6 MB | ~500 MB+ |
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file crewai_crw-0.1.0.tar.gz.
File metadata
- Download URL: crewai_crw-0.1.0.tar.gz
- Upload date:
- Size: 277.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.25 {"installer":{"name":"uv","version":"0.9.25","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1b85cbd8a47e2a6b35aa70282a1d00a9f88ffcd546c846bce5eea768d72058e7
|
|
| MD5 |
22cd4455b5a1b9effb1196b6e56d85b0
|
|
| BLAKE2b-256 |
f4572047a71606f88286de4d671faff168bf27f68b07f449d70ca497e5cd682a
|
File details
Details for the file crewai_crw-0.1.0-py3-none-any.whl.
File metadata
- Download URL: crewai_crw-0.1.0-py3-none-any.whl
- Upload date:
- Size: 6.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.25 {"installer":{"name":"uv","version":"0.9.25","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5e95c7b559118447102bcd48d021e1dfa72f1f5a66c3ce51d0b251e5a5403f14
|
|
| MD5 |
450346c12f32568ee51239931f5af392
|
|
| BLAKE2b-256 |
b78a199d8ad0f91e796ef0e32a86ce1da3724722986c5953f579bac23cf9b1a1
|