Skip to main content

CRW web scraping tools for CrewAI — scrape, crawl, and map websites

Project description

crewai-crw

CRW web scraping tools for CrewAI — scrape, crawl, and map websites with AI agents.

CRW is an open-source web scraper built for AI agents. Single Rust binary, ~6 MB idle RAM, Firecrawl-compatible API.

Installation

pip install crewai-crw

You also need a CRW backend — either self-hosted or cloud:

Option A: Self-hosted (free)

curl -fsSL https://raw.githubusercontent.com/us/crw/main/install.sh | bash
crw  # starts on http://localhost:3000

Option B: Cloud (fastcrw.com)

export CRW_API_URL=https://fastcrw.com/api
export CRW_API_KEY=your_api_key

Tools

Tool Description
CrwScrapeWebsiteTool Scrape a single URL and get clean markdown
CrwCrawlWebsiteTool BFS crawl a website, collect content from multiple pages
CrwMapWebsiteTool Discover all URLs on a website

Quick Start

from crewai import Agent, Task, Crew
from crewai_crw import CrwScrapeWebsiteTool

# Self-hosted (default: localhost:3000)
scrape_tool = CrwScrapeWebsiteTool()

# Or use the cloud
scrape_tool = CrwScrapeWebsiteTool(
    api_url="https://fastcrw.com/api",
    api_key="YOUR_KEY",
)

researcher = Agent(
    role="Web Researcher",
    goal="Research and summarize information from websites",
    backstory="Expert at extracting key information from web pages",
    tools=[scrape_tool],
)

task = Task(
    description="Scrape https://example.com and summarize the content",
    expected_output="A summary of the page content",
    agent=researcher,
)

crew = Crew(agents=[researcher], tasks=[task])
result = crew.kickoff()

Crawl an entire site

from crewai_crw import CrwCrawlWebsiteTool

crawl_tool = CrwCrawlWebsiteTool(
    config={
        "maxDepth": 3,
        "maxPages": 50,
        "formats": ["markdown"],
        "onlyMainContent": True,
    }
)

Discover all URLs on a site

from crewai_crw import CrwMapWebsiteTool

map_tool = CrwMapWebsiteTool()

Configuration

Constructor Arguments

Argument Type Default Description
api_url str http://localhost:3000 CRW server URL
api_key str | None None API key (required for fastcrw.com)
config dict varies per tool Tool-specific configuration

Environment Variables

Both CRW_API_URL and CRW_API_KEY can be set via environment variables as fallbacks:

export CRW_API_URL=https://fastcrw.com/api  # or http://localhost:3000
export CRW_API_KEY=your_api_key              # required for cloud, optional for self-hosted
# With env vars set, no constructor args needed:
tool = CrwScrapeWebsiteTool()

Scrape Config

Key Type Default Description
formats list[str] ["markdown"] Output formats: markdown, html, rawHtml, plainText, links, json
onlyMainContent bool true Strip nav/footer/sidebar
renderJs bool|null null null=auto, true=force JS, false=HTTP only
waitFor int ms to wait after JS rendering
includeTags list[str] [] CSS selectors to include
excludeTags list[str] [] CSS selectors to exclude

Crawl Config

Key Type Default Description
maxDepth int 2 Maximum link-follow depth
maxPages int 10 Maximum pages to scrape
formats list[str] ["markdown"] Output formats per page
onlyMainContent bool true Strip boilerplate

Map Config

Key Type Default Description
maxDepth int 2 Maximum discovery depth
useSitemap bool true Also read sitemap.xml

Compared to Firecrawl Tools

Feature crewai-crw Firecrawl Tools
Requires SDK package No (uses requests) Yes (firecrawl-py)
Requires API key No (self-hosted) Yes (always)
Self-hosted option Yes (single binary) Complex (5+ containers)
Cloud option Yes (fastcrw.com) Yes (firecrawl.dev)
Idle RAM ~6 MB ~500 MB+

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crewai_crw-0.1.0.tar.gz (277.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

crewai_crw-0.1.0-py3-none-any.whl (6.6 kB view details)

Uploaded Python 3

File details

Details for the file crewai_crw-0.1.0.tar.gz.

File metadata

  • Download URL: crewai_crw-0.1.0.tar.gz
  • Upload date:
  • Size: 277.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.25 {"installer":{"name":"uv","version":"0.9.25","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for crewai_crw-0.1.0.tar.gz
Algorithm Hash digest
SHA256 1b85cbd8a47e2a6b35aa70282a1d00a9f88ffcd546c846bce5eea768d72058e7
MD5 22cd4455b5a1b9effb1196b6e56d85b0
BLAKE2b-256 f4572047a71606f88286de4d671faff168bf27f68b07f449d70ca497e5cd682a

See more details on using hashes here.

File details

Details for the file crewai_crw-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: crewai_crw-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 6.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.25 {"installer":{"name":"uv","version":"0.9.25","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for crewai_crw-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5e95c7b559118447102bcd48d021e1dfa72f1f5a66c3ce51d0b251e5a5403f14
MD5 450346c12f32568ee51239931f5af392
BLAKE2b-256 b78a199d8ad0f91e796ef0e32a86ce1da3724722986c5953f579bac23cf9b1a1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page