llama-index tools Bright Data integration
Project description
LlamaIndex Tools Integration: Bright Data
This tool connects to Bright Data to enable your agent to crawl websites, search the web, and access structured data from platforms like LinkedIn, Amazon, and social media.
Bright Data's tools provide robust web scraping capabilities with built-in CAPTCHA solving and bot detection avoidance, allowing you to reliably extract data from the web.
Installation
pip install llama-index llama-index-core llama-index-tools-brightdata
Authentication
Sign up at Bright Data and retrieve your API key from your account settings. Replace "your-api-key" with your actual API key in the examples below:
Usage
Here's an example of how to use the BrightDataToolSpec with LlamaIndex:
from llama_index.core.agent.workflow import FunctionAgent
from llama_index.llms.openai import OpenAI
from llama_index.tools.brightdata import BrightDataToolSpec
brightdata_tool = BrightDataToolSpec(api_key="your-api-key", zone="unlocker")
tool_list = brightdata_tool.to_tool_list()
for tool in tool_list:
tool.original_description = tool.metadata.description
tool.metadata.description = "Bright Data web scraping tool"
agent = FunctionAgent(
tools=tool_list,
llm=OpenAI(model="gpt-4.1"),
)
query = (
"Find and summarize the latest news about AI from major tech news sites"
)
tool_descriptions = "\n\n".join(
[
f"Tool Name: {tool.metadata.name}\nTool Description: {tool.original_description}"
for tool in tool_list
]
)
query_with_descriptions = f"{tool_descriptions}\n\nQuery: {query}"
response = await agent.run(query_with_descriptions)
print(response)
Features
The Bright Data tool provides the following capabilities:
Web Scraping
scrape_as_markdown: Scrape a webpage and convert the content to Markdown format. This tool can bypass CAPTCHA and bot detection.
result = brightdata_tool.scrape_as_markdown("https://example.com")
print(result.text)
Visual Capture
get_screenshot: Take a screenshot of a webpage and save it to a file.
screenshot_path = brightdata_tool.get_screenshot(
"https://example.com", output_path="example_screenshot.png"
)
Search Engine Access
search_engine: Search Google, Bing, or Yandex and get structured search results as JSON or Markdown. Supports advanced parameters for more specific searches.
search_results = brightdata_tool.search_engine(
query="climate change solutions",
engine="google",
language="en",
country_code="us",
num_results=20,
)
print(search_results.text)
Structured Web Data Extraction
web_data_feed: Retrieve structured data from various platforms including LinkedIn, Amazon, Instagram, Facebook, X (Twitter), Zillow, and more.
linkedin_profile = brightdata_tool.web_data_feed(
source_type="linkedin_person_profile",
url="https://www.linkedin.com/in/username/",
)
print(linkedin_profile)
amazon_product = brightdata_tool.web_data_feed(
source_type="amazon_product", url="https://www.amazon.com/dp/B08N5KWB9H"
)
print(amazon_product)
Advanced Configuration
The Bright Data tool offers various configuration options for specialized use cases:
Search Engine Parameters
The search_engine function supports advanced parameters like:
- Language targeting (
languageparameter) - Country-specific search (
country_codeparameter) - Different search types (images, shopping, news, etc.)
- Pagination controls
- Mobile device emulation
- Geolocation targeting
- Hotel search parameters
results = brightdata_tool.search_engine(
query="best hotels in paris",
engine="google",
language="fr",
country_code="fr",
search_type="shopping",
device="mobile",
hotel_dates="2025-06-01,2025-06-05",
hotel_occupancy=2,
)
Supported Web Data Sources
The web_data_feed function supports retrieving structured data from:
- LinkedIn (profiles and companies)
- Amazon (products and reviews)
- Instagram (profiles, posts, reels, comments)
- Facebook (posts, marketplace listings, company reviews)
- X/Twitter (posts)
- Zillow (property listings)
- Booking.com (hotel listings)
- YouTube (videos)
- ZoomInfo (company profiles)
For more information, visit the Bright Data documentation.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llama_index_tools_brightdata-0.3.0.tar.gz.
File metadata
- Download URL: llama_index_tools_brightdata-0.3.0.tar.gz
- Upload date:
- Size: 6.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bda559207bac322a6ea02ee90bf54ad19064f3e721bff36c2e24bc9af2f41449
|
|
| MD5 |
eba0e9fa58c9a33b0bd595e13f9cb201
|
|
| BLAKE2b-256 |
6c8c56e4a50fe682c4822b90158d873dc31802bc26828f9ed8f4833e963ef4ba
|
File details
Details for the file llama_index_tools_brightdata-0.3.0-py3-none-any.whl.
File metadata
- Download URL: llama_index_tools_brightdata-0.3.0-py3-none-any.whl
- Upload date:
- Size: 7.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ccb6c0c757fd2fe6482106556c86b9f1032c852d0538b1ee62a9ba49e319c8f6
|
|
| MD5 |
d4f50e4327c3f3065da6b59e547dead2
|
|
| BLAKE2b-256 |
6805ee6206c042c6ae23d83d4d5f38de5f1e6b2b054a692afb77041c5386d32c
|