A CrewAI tool for smart web scraping via the Browserless /smart-scrape API.
Project description
crewai-browserless
A CrewAI tool for scraping web pages using the Browserless /smart-scrape API. Handles anti-bot detection, captchas, and proxying automatically.
Installation
pip install crewai-browserless
Or with uv:
uv add crewai-browserless
Setup
Get a Browserless API Token at https://www.browserless.io/, then set the BROWSERLESS_API_TOKEN:
export BROWSERLESS_API_TOKEN="your-token"
# for private deplyments:
# export BROWSERLESS_API_URL="https://your-browserless-instance.com"
Usage
With a CrewAI agent
from crewai import Agent, Crew, Task
from crewai_browserless import BrowserlessSmartScrapeTool
agent = Agent(
role="Web Researcher",
goal="Scrape web pages and summarize their content.",
backstory="An expert at extracting useful information from websites.",
tools=[BrowserlessSmartScrapeTool()],
)
task = Task(
description="Scrape https://www.browserless.io/blog/java-memory-leak and summarize what the page is about.",
expected_output="A short summary of the page content.",
agent=agent,
)
crew = Crew(agents=[agent], tasks=[task])
result = crew.kickoff()
print(result)
Standalone
from crewai_browserless import BrowserlessSmartScrapeTool
tool = BrowserlessSmartScrapeTool()
# Call the tool directly
result = tool.run(url="https://en.wikipedia.org/wiki/Headless_browser", formats=["markdown"])
print(result)
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
url |
str |
required | The URL to scrape (http/https only) |
formats |
list[str] |
["markdown"] |
Output formats: markdown, html, screenshot, pdf, links |
timeout |
int | None |
None |
Timeout in milliseconds (uses server default if not set) |
Environment Variables
| Variable | Required | Description |
|---|---|---|
BROWSERLESS_API_URL |
Yes | Base URL of your Browserless instance |
BROWSERLESS_API_TOKEN |
No | API token for authentication |
How It Works
The tool sends a POST request to the Browserless /smart-scrape endpoint, which uses a cascading strategy pipeline:
- HTTP fetch — fast, direct request
- HTTP fetch with proxy — retries through a residential proxy
- Browser rendering — headless browser for JavaScript-heavy pages
- Browser with captcha solving — handles captcha challenges automatically
The first strategy that succeeds returns the result. If screenshot or pdf formats are requested, browser strategies are used automatically.
License
SSPL-1.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file crewai_browserless-1.0.0.tar.gz.
File metadata
- Download URL: crewai_browserless-1.0.0.tar.gz
- Upload date:
- Size: 5.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.30 {"installer":{"name":"uv","version":"0.9.30","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c4aa825c76f7b37a9a12deab4bd03f62344b3de2fe752ac0037372a56c03f380
|
|
| MD5 |
e654c3940f1f13b5d476d85ccea1454f
|
|
| BLAKE2b-256 |
83b6ccdd2d09f5849680d85e5df07acdfb0fc24acaabba69f41186d867dd54c0
|
File details
Details for the file crewai_browserless-1.0.0-py3-none-any.whl.
File metadata
- Download URL: crewai_browserless-1.0.0-py3-none-any.whl
- Upload date:
- Size: 4.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.30 {"installer":{"name":"uv","version":"0.9.30","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
12dec62193595ba1f6a99fd07195e6894c75fdc309b384a44756002971e69db5
|
|
| MD5 |
81d63d5932e56189bf2ed8b2dfb22f8e
|
|
| BLAKE2b-256 |
30809d2c383a1257c34b26ea3ec3a8ff9075e995c67715079248435823bea04c
|