Skip to main content

CRW web scraping tools for CrewAI — scrape, crawl, and map websites

Project description

crewai-crw

PyPI version Python License: MIT

CRW web scraping tools for CrewAI — scrape, crawl, map, and search the web with AI agents.

CRW is an open-source web scraper built for AI agents. Single Rust binary, ~6 MB idle RAM, Firecrawl-compatible API.

Installation

pip install crewai crewai-crw
# or
uv add crewai crewai-crw

That's it. No server to install, no cargo install, no Docker. The crw SDK automatically downloads and manages the CRW binary for you.

Quick Start — Zero Config (Subprocess Mode)

from crewai_crw import CrwScrapeWebsiteTool

# Just works — crw SDK handles everything locally
scrape_tool = CrwScrapeWebsiteTool()

Cloud Mode (fastcrw.com)

No local binary needed. Sign up at fastcrw.com and get 500 free credits:

from crewai_crw import CrwScrapeWebsiteTool

scrape_tool = CrwScrapeWebsiteTool(
    api_url="https://fastcrw.com/api",
    api_key="crw_live_...",  # or set CRW_API_KEY env var
)

Advanced: Self-hosted Server

If you prefer running a persistent CRW server (e.g., shared across services):

# Option A: Install binary
curl -fsSL https://raw.githubusercontent.com/us/crw/main/install.sh | bash
crw  # starts on http://localhost:3000

# Option B: Docker
docker run -d -p 3000:3000 ghcr.io/us/crw:latest
scrape_tool = CrwScrapeWebsiteTool(api_url="http://localhost:3000")

Tools

Tool Description
CrwScrapeWebsiteTool Scrape a single URL and get clean markdown
CrwCrawlWebsiteTool BFS crawl a website, collect content from multiple pages
CrwMapWebsiteTool Discover all URLs on a website
CrwSearchWebTool Search the web and get results (cloud only)

CrewAI Example

from crewai import Agent, Task, Crew
from crewai_crw import CrwScrapeWebsiteTool

# Zero config — just works out of the box
scrape_tool = CrwScrapeWebsiteTool()

researcher = Agent(
    role="Web Researcher",
    goal="Research and summarize information from websites",
    backstory="Expert at extracting key information from web pages",
    tools=[scrape_tool],
)

task = Task(
    description="Scrape https://example.com and summarize the content",
    expected_output="A summary of the page content",
    agent=researcher,
)

crew = Crew(agents=[researcher], tasks=[task])
result = crew.kickoff()

Crawl an entire site

from crewai_crw import CrwCrawlWebsiteTool

crawl_tool = CrwCrawlWebsiteTool(
    config={
        "maxDepth": 3,
        "maxPages": 50,
        "formats": ["markdown"],
        "onlyMainContent": True,
    }
)

# Use in an agent
researcher = Agent(
    role="Deep Researcher",
    goal="Crawl documentation sites and extract comprehensive information",
    backstory="Expert at gathering information across multiple pages",
    tools=[crawl_tool],
)

Discover all URLs on a site

from crewai_crw import CrwMapWebsiteTool

map_tool = CrwMapWebsiteTool()

# Use in an agent
mapper = Agent(
    role="Site Mapper",
    goal="Discover and catalog all pages on a website",
    backstory="Expert at understanding website structure",
    tools=[map_tool],
)

Search the web (Cloud Only)

Note: Web search is a cloud-only feature. It requires api_url pointing to a CRW cloud instance (e.g. fastcrw.com). Subprocess mode is not supported for search.

from crewai_crw import CrwSearchWebTool

search_tool = CrwSearchWebTool(
    api_url="https://fastcrw.com/api",
    api_key="YOUR_KEY",
)

# Use in an agent
researcher = Agent(
    role="Web Researcher",
    goal="Find the latest information on any topic",
    backstory="Expert at searching the web for relevant information",
    tools=[search_tool],
)

Configuration

Constructor Arguments

Argument Type Default Description
api_url str | None None CRW server URL. If unset, uses subprocess mode (no server needed)
api_key str | None None API key (required for fastcrw.com)
config dict varies per tool Tool-specific configuration

Environment Variables

Both CRW_API_URL and CRW_API_KEY can be set via environment variables as fallbacks:

export CRW_API_URL=https://fastcrw.com/api  # or http://localhost:3000
export CRW_API_KEY=your_api_key              # required for cloud, optional for self-hosted
# With env vars set, no constructor args needed:
tool = CrwScrapeWebsiteTool()

Scrape Config

Key Type Default Description
formats list[str] ["markdown"] Output formats: markdown, html, rawHtml, plainText, links, json
onlyMainContent bool true Strip nav/footer/sidebar
renderJs bool|null null null=auto, true=force JS, false=HTTP only
waitFor int ms to wait after JS rendering
includeTags list[str] [] CSS selectors to include
excludeTags list[str] [] CSS selectors to exclude

Crawl Config

Key Type Default Description
maxDepth int 2 Maximum link-follow depth
maxPages int 10 Maximum pages to scrape
formats list[str] ["markdown"] Output formats per page
onlyMainContent bool true Strip boilerplate

Map Config

Key Type Default Description
maxDepth int 2 Maximum discovery depth
useSitemap bool true Also read sitemap.xml

Search Config (Cloud Only)

Key Type Default Description
limit int 5 Maximum number of results to return

Compared to Firecrawl Tools

Feature crewai-crw Firecrawl Tools
Requires SDK package No (uses requests) Yes (firecrawl-py)
Requires API key No (subprocess or self-hosted) Yes (always)
Server required No (pip install is all you need) Yes (always)
Self-hosted option Yes (single binary, auto-managed) Complex (5+ containers)
Cloud option Yes (fastcrw.com) Yes (firecrawl.dev)
Idle RAM ~6 MB ~500 MB+

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crewai_crw-0.4.0.tar.gz (238.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

crewai_crw-0.4.0-py3-none-any.whl (7.4 kB view details)

Uploaded Python 3

File details

Details for the file crewai_crw-0.4.0.tar.gz.

File metadata

  • Download URL: crewai_crw-0.4.0.tar.gz
  • Upload date:
  • Size: 238.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.3 {"installer":{"name":"uv","version":"0.11.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for crewai_crw-0.4.0.tar.gz
Algorithm Hash digest
SHA256 a9d634a4d67cecfd164109e69b220188af49b5ac59341ab98ebe453cfc726b97
MD5 2b7a35a71730f8b1c55faf559218664d
BLAKE2b-256 47037e8360a1eb565b1451a578c1bbfe717385122369370221a1a2dbd3055eba

See more details on using hashes here.

File details

Details for the file crewai_crw-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: crewai_crw-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 7.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.3 {"installer":{"name":"uv","version":"0.11.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for crewai_crw-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d2f0e7c3fb4ca13ad6b1cdbdb4dca51141590e4c47e1c72b591e84fa439660e7
MD5 76c24f1cc293cb36ee4f913599a2c0b0
BLAKE2b-256 cb033d6055e3739e9b145cc7009d993acefdc8f61d2a5aa7ba6628d5b9af9bd8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page