Skip to main content

llama-index tools Bright Data integration

Project description

LlamaIndex Tools Integration: Bright Data

This tool connects to Bright Data to enable your agent to crawl websites, search the web, and access structured data from platforms like LinkedIn, Amazon, and social media.

Bright Data's tools provide robust web scraping capabilities with built-in CAPTCHA solving and bot detection avoidance, allowing you to reliably extract data from the web.

Installation

pip install llama-index llama-index-core llama-index-tools-brightdata

Authentication

Sign up at Bright Data and retrieve your API key from your account settings. Replace "your-api-key" with your actual API key in the examples below:

Usage

Here's an example of how to use the BrightDataToolSpec with LlamaIndex:

from llama_index.core.agent.workflow import FunctionAgent
from llama_index.llms.openai import OpenAI
from llama_index.tools.brightdata import BrightDataToolSpec

brightdata_tool = BrightDataToolSpec(api_key="your-api-key", zone="unlocker")

tool_list = brightdata_tool.to_tool_list()

for tool in tool_list:
    tool.original_description = tool.metadata.description
    tool.metadata.description = "Bright Data web scraping tool"

agent = FunctionAgent(
    tools=tool_list,
    llm=OpenAI(model="gpt-4.1"),
)

query = (
    "Find and summarize the latest news about AI from major tech news sites"
)
tool_descriptions = "\n\n".join(
    [
        f"Tool Name: {tool.metadata.name}\nTool Description: {tool.original_description}"
        for tool in tool_list
    ]
)

query_with_descriptions = f"{tool_descriptions}\n\nQuery: {query}"

response = await agent.run(query_with_descriptions)
print(response)

Features

The Bright Data tool provides the following capabilities:

Web Scraping

  • scrape_as_markdown: Scrape a webpage and convert the content to Markdown format. This tool can bypass CAPTCHA and bot detection.
result = brightdata_tool.scrape_as_markdown("https://example.com")
print(result.text)

Visual Capture

  • get_screenshot: Take a screenshot of a webpage and save it to a file.
screenshot_path = brightdata_tool.get_screenshot(
    "https://example.com", output_path="example_screenshot.png"
)

Search Engine Access

  • search_engine: Search Google, Bing, or Yandex and get structured search results as JSON or Markdown. Supports advanced parameters for more specific searches.
search_results = brightdata_tool.search_engine(
    query="climate change solutions",
    engine="google",
    language="en",
    country_code="us",
    num_results=20,
)
print(search_results.text)

Structured Web Data Extraction

  • web_data_feed: Retrieve structured data from various platforms including LinkedIn, Amazon, Instagram, Facebook, X (Twitter), Zillow, and more.
linkedin_profile = brightdata_tool.web_data_feed(
    source_type="linkedin_person_profile",
    url="https://www.linkedin.com/in/username/",
)
print(linkedin_profile)

amazon_product = brightdata_tool.web_data_feed(
    source_type="amazon_product", url="https://www.amazon.com/dp/B08N5KWB9H"
)
print(amazon_product)

Advanced Configuration

The Bright Data tool offers various configuration options for specialized use cases:

Search Engine Parameters

The search_engine function supports advanced parameters like:

  • Language targeting (language parameter)
  • Country-specific search (country_code parameter)
  • Different search types (images, shopping, news, etc.)
  • Pagination controls
  • Mobile device emulation
  • Geolocation targeting
  • Hotel search parameters
results = brightdata_tool.search_engine(
    query="best hotels in paris",
    engine="google",
    language="fr",
    country_code="fr",
    search_type="shopping",
    device="mobile",
    hotel_dates="2025-06-01,2025-06-05",
    hotel_occupancy=2,
)

Supported Web Data Sources

The web_data_feed function supports retrieving structured data from:

  • LinkedIn (profiles and companies)
  • Amazon (products and reviews)
  • Instagram (profiles, posts, reels, comments)
  • Facebook (posts, marketplace listings, company reviews)
  • X/Twitter (posts)
  • Zillow (property listings)
  • Booking.com (hotel listings)
  • YouTube (videos)
  • ZoomInfo (company profiles)

For more information, visit the Bright Data documentation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama_index_tools_brightdata-0.3.0.tar.gz (6.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llama_index_tools_brightdata-0.3.0-py3-none-any.whl (7.7 kB view details)

Uploaded Python 3

File details

Details for the file llama_index_tools_brightdata-0.3.0.tar.gz.

File metadata

  • Download URL: llama_index_tools_brightdata-0.3.0.tar.gz
  • Upload date:
  • Size: 6.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for llama_index_tools_brightdata-0.3.0.tar.gz
Algorithm Hash digest
SHA256 bda559207bac322a6ea02ee90bf54ad19064f3e721bff36c2e24bc9af2f41449
MD5 eba0e9fa58c9a33b0bd595e13f9cb201
BLAKE2b-256 6c8c56e4a50fe682c4822b90158d873dc31802bc26828f9ed8f4833e963ef4ba

See more details on using hashes here.

File details

Details for the file llama_index_tools_brightdata-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: llama_index_tools_brightdata-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 7.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for llama_index_tools_brightdata-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ccb6c0c757fd2fe6482106556c86b9f1032c852d0538b1ee62a9ba49e319c8f6
MD5 d4f50e4327c3f3065da6b59e547dead2
BLAKE2b-256 6805ee6206c042c6ae23d83d4d5f38de5f1e6b2b054a692afb77041c5386d32c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page