Skip to main content

LangChain integration for Bright Data web scraping and SERP APIs

Project description

langchain-brightdata

LangChain integration for Bright Data's web data APIs

PyPI version Python License: MIT Downloads

InstallationQuick StartToolsConfigurationResources


Overview

langchain-brightdata provides LangChain tools for Bright Data's web data APIs, enabling your AI agents to:

  • Search - Query search engines with geo-targeting and language customization
  • Unlock - Access geo-restricted or bot-protected websites
  • Scrape - Extract structured data from Amazon, LinkedIn, and 100+ domains

Installation

pip install langchain-brightdata

Requirements: Python 3.9+


Quick Start

1. Get your API key

Sign up at Bright Data and get your API key from the dashboard.

2. Set up authentication

import os
os.environ["BRIGHT_DATA_API_KEY"] = "your-api-key"

Or pass it directly:

from langchain_brightdata import BrightDataSERP
tool = BrightDataSERP(bright_data_api_key="your-api-key")

3. Use with LangChain agents

from langchain_brightdata import BrightDataSERP, BrightDataUnlocker, BrightDataWebScraperAPI
from langchain.agents import initialize_agent, AgentType
from langchain_openai import ChatOpenAI

# Initialize tools
tools = [
    BrightDataSERP(),
    BrightDataUnlocker(),
    BrightDataWebScraperAPI()
]

# Create agent
llm = ChatOpenAI(model="gpt-4")
agent = initialize_agent(tools, llm, agent=AgentType.OPENAI_FUNCTIONS)

# Run
agent.run("Search for the latest AI news and summarize the top result")

Tools

BrightDataSERP

Search engine results with geo-targeting and customization.

from langchain_brightdata import BrightDataSERP

serp = BrightDataSERP()

# Simple search
results = serp.invoke("latest AI research")

# Advanced search
results = serp.invoke({
    "query": "electric vehicles",
    "country": "de",
    "language": "de",
    "search_type": "news",
    "results_count": 20
})

Parameters

Parameter Type Default Description
query str required Search query
zone str "serp" Bright Data zone name
search_engine str "google" Search engine (google, bing, yahoo)
country str "us" Two-letter country code
language str "en" Two-letter language code
results_count int 10 Number of results (max 100)
search_type str None None (web), "isch" (images), "shop", "nws" (news), "jobs"
device_type str None None (desktop), "mobile", "ios", "android"
parse_results bool False Return structured JSON

BrightDataUnlocker

Access any public website, bypassing geo-restrictions and bot protection.

from langchain_brightdata import BrightDataUnlocker

unlocker = BrightDataUnlocker()

# Simple access
content = unlocker.invoke("https://example.com")

# With options
content = unlocker.invoke({
    "url": "https://example.com/restricted",
    "country": "gb",
    "data_format": "markdown"
})

Parameters

Parameter Type Default Description
url str required URL to access
zone str "unlocker" Bright Data zone name
country str None Two-letter country code
data_format str None None (HTML), "markdown", "screenshot"

BrightDataWebScraperAPI

Extract structured data from popular websites.

from langchain_brightdata import BrightDataWebScraperAPI

scraper = BrightDataWebScraperAPI()

# Amazon product
product = scraper.invoke({
    "url": "https://www.amazon.com/dp/B08L5TNJHG",
    "dataset_type": "amazon_product"
})

# LinkedIn profile
profile = scraper.invoke({
    "url": "https://www.linkedin.com/in/satyanadella/",
    "dataset_type": "linkedin_person_profile"
})

Parameters

Parameter Type Default Description
url str required URL to scrape
dataset_type str required Type of data to extract
zipcode str None Zipcode for location-specific data

Supported Dataset Types

Type Description
amazon_product Product details, pricing, specs
amazon_product_reviews Customer reviews and ratings
linkedin_person_profile Professional profile data
linkedin_company_profile Company information

Configuration

Zone Configuration

Bright Data uses "zones" to manage different API configurations. You can set the zone at initialization or per-request.

Setting zone at initialization

from langchain_brightdata import BrightDataSERP, BrightDataUnlocker

# SERP with custom zone
serp = BrightDataSERP(
    bright_data_api_key="your-api-key",
    zone="my_serp_zone"
)

# Unlocker with custom zone
unlocker = BrightDataUnlocker(
    bright_data_api_key="your-api-key",
    zone="my_unlocker_zone"
)

Setting zone per-request

# Override zone for a specific request
results = serp.invoke({
    "query": "AI news",
    "zone": "different_zone"
})

Default zones

Tool Default Zone
BrightDataSERP serp
BrightDataUnlocker unlocker

Note: Zone names must match the zones configured in your Bright Data dashboard.


Resources


License

MIT License - see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langchain_brightdata-0.1.9.tar.gz (12.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

langchain_brightdata-0.1.9-py3-none-any.whl (15.6 kB view details)

Uploaded Python 3

File details

Details for the file langchain_brightdata-0.1.9.tar.gz.

File metadata

  • Download URL: langchain_brightdata-0.1.9.tar.gz
  • Upload date:
  • Size: 12.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for langchain_brightdata-0.1.9.tar.gz
Algorithm Hash digest
SHA256 a892eb1e415621187bfcee309688d2d63c174f1be2ffee004d3630c63dfe6e80
MD5 33f139216de10b92464f18255a53b04e
BLAKE2b-256 d56ee35c5d72b1f7012757a1457417c11a98014c48fcefd57addf339b20f7121

See more details on using hashes here.

Provenance

The following attestation bundles were made for langchain_brightdata-0.1.9.tar.gz:

Publisher: publish.yml on luminati-io/langchain-brightdata

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file langchain_brightdata-0.1.9-py3-none-any.whl.

File metadata

File hashes

Hashes for langchain_brightdata-0.1.9-py3-none-any.whl
Algorithm Hash digest
SHA256 98359255ab143ab2d9829f57ee9d2af8334bb6d844818d76e5d7e9befaef827a
MD5 362e209c6765cc10510c646f9546f52d
BLAKE2b-256 13446f030690b3603e6925ef895391ea220739f7b2c33fae3c0adf4acb617a76

See more details on using hashes here.

Provenance

The following attestation bundles were made for langchain_brightdata-0.1.9-py3-none-any.whl:

Publisher: publish.yml on luminati-io/langchain-brightdata

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page