Skip to main content

A CrewAI tool for smart web scraping via the Browserless /smart-scrape API.

Project description

crewai-browserless

A CrewAI tool for scraping web pages using the Browserless /smart-scrape API. Handles anti-bot detection, captchas, and proxying automatically.

Installation

pip install crewai-browserless

Or with uv:

uv add crewai-browserless

Setup

Get a Browserless API Token at https://www.browserless.io/, then set the BROWSERLESS_API_TOKEN:

export BROWSERLESS_API_TOKEN="your-token"

# for private deplyments:
# export BROWSERLESS_API_URL="https://your-browserless-instance.com"

Usage

With a CrewAI agent

from crewai import Agent, Crew, Task
from crewai_browserless import BrowserlessSmartScrapeTool

agent = Agent(
    role="Web Researcher",
    goal="Scrape web pages and summarize their content.",
    backstory="An expert at extracting useful information from websites.",
    tools=[BrowserlessSmartScrapeTool()],
)

task = Task(
    description="Scrape https://www.browserless.io/blog/java-memory-leak and summarize what the page is about.",
    expected_output="A short summary of the page content.",
    agent=agent,
)

crew = Crew(agents=[agent], tasks=[task])
result = crew.kickoff()

print(result)

Standalone

from crewai_browserless import BrowserlessSmartScrapeTool

tool = BrowserlessSmartScrapeTool()

# Call the tool directly
result = tool.run(url="https://en.wikipedia.org/wiki/Headless_browser", formats=["markdown"])

print(result)

Parameters

Parameter Type Default Description
url str required The URL to scrape (http/https only)
formats list[str] ["markdown"] Output formats: markdown, html, screenshot, pdf, links
timeout int | None None Timeout in milliseconds (uses server default if not set)

Environment Variables

Variable Required Description
BROWSERLESS_API_URL Yes Base URL of your Browserless instance
BROWSERLESS_API_TOKEN No API token for authentication

How It Works

The tool sends a POST request to the Browserless /smart-scrape endpoint, which uses a cascading strategy pipeline:

  1. HTTP fetch — fast, direct request
  2. HTTP fetch with proxy — retries through a residential proxy
  3. Browser rendering — headless browser for JavaScript-heavy pages
  4. Browser with captcha solving — handles captcha challenges automatically

The first strategy that succeeds returns the result. If screenshot or pdf formats are requested, browser strategies are used automatically.

License

SSPL-1.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crewai_browserless-1.0.0.tar.gz (5.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

crewai_browserless-1.0.0-py3-none-any.whl (4.0 kB view details)

Uploaded Python 3

File details

Details for the file crewai_browserless-1.0.0.tar.gz.

File metadata

  • Download URL: crewai_browserless-1.0.0.tar.gz
  • Upload date:
  • Size: 5.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.30 {"installer":{"name":"uv","version":"0.9.30","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for crewai_browserless-1.0.0.tar.gz
Algorithm Hash digest
SHA256 c4aa825c76f7b37a9a12deab4bd03f62344b3de2fe752ac0037372a56c03f380
MD5 e654c3940f1f13b5d476d85ccea1454f
BLAKE2b-256 83b6ccdd2d09f5849680d85e5df07acdfb0fc24acaabba69f41186d867dd54c0

See more details on using hashes here.

File details

Details for the file crewai_browserless-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: crewai_browserless-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 4.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.30 {"installer":{"name":"uv","version":"0.9.30","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for crewai_browserless-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 12dec62193595ba1f6a99fd07195e6894c75fdc309b384a44756002971e69db5
MD5 81d63d5932e56189bf2ed8b2dfb22f8e
BLAKE2b-256 30809d2c383a1257c34b26ea3ec3a8ff9075e995c67715079248435823bea04c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page