Skip to main content

LangChain integration for Bright Data web scraping and SERP APIs

Project description

langchain-brightdata

LangChain integration for Bright Data's web data APIs

PyPI version Python License: MIT Downloads

InstallationQuick StartToolsConfigurationResources


Overview

langchain-brightdata provides LangChain tools for Bright Data's web data APIs, enabling your AI agents to:

  • Search - Query search engines with geo-targeting and language customization
  • Unlock - Access geo-restricted or bot-protected websites
  • Scrape - Extract structured data from Amazon, LinkedIn, and 100+ domains

Installation

pip install langchain-brightdata

Requirements: Python 3.9+


Quick Start

1. Get your API key

Sign up at Bright Data and get your API key from the dashboard.

2. Set up authentication

import os
os.environ["BRIGHT_DATA_API_KEY"] = "your-api-key"

Or pass it directly:

from langchain_brightdata import BrightDataSERP
tool = BrightDataSERP(bright_data_api_key="your-api-key")

3. Use with LangChain agents

from langchain_brightdata import BrightDataSERP, BrightDataUnlocker, BrightDataWebScraperAPI
from langchain.agents import initialize_agent, AgentType
from langchain_openai import ChatOpenAI

# Initialize tools
tools = [
    BrightDataSERP(),
    BrightDataUnlocker(),
    BrightDataWebScraperAPI()
]

# Create agent
llm = ChatOpenAI(model="gpt-4")
agent = initialize_agent(tools, llm, agent=AgentType.OPENAI_FUNCTIONS)

# Run
agent.run("Search for the latest AI news and summarize the top result")

Tools

BrightDataSERP

Search engine results with geo-targeting and customization.

from langchain_brightdata import BrightDataSERP

serp = BrightDataSERP()

# Simple search
results = serp.invoke("latest AI research")

# Advanced search
results = serp.invoke({
    "query": "electric vehicles",
    "country": "de",
    "language": "de",
    "search_type": "news",
    "results_count": 20
})

Parameters

Parameter Type Default Description
query str required Search query
zone str "serp" Bright Data zone name
search_engine str "google" Search engine (google, bing, yahoo)
country str "us" Two-letter country code
language str "en" Two-letter language code
results_count int 10 Number of results (max 100)
search_type str None None (web), "isch" (images), "shop", "nws" (news), "jobs"
device_type str None None (desktop), "mobile", "ios", "android"
parse_results bool False Return structured JSON

BrightDataUnlocker

Access any public website, bypassing geo-restrictions and bot protection.

from langchain_brightdata import BrightDataUnlocker

unlocker = BrightDataUnlocker()

# Simple access
content = unlocker.invoke("https://example.com")

# With options
content = unlocker.invoke({
    "url": "https://example.com/restricted",
    "country": "gb",
    "data_format": "markdown"
})

Parameters

Parameter Type Default Description
url str required URL to access
zone str "unlocker" Bright Data zone name
country str None Two-letter country code
data_format str None None (HTML), "markdown", "screenshot"

BrightDataWebScraperAPI

Extract structured data from popular websites.

from langchain_brightdata import BrightDataWebScraperAPI

scraper = BrightDataWebScraperAPI()

# Amazon product
product = scraper.invoke({
    "url": "https://www.amazon.com/dp/B08L5TNJHG",
    "dataset_type": "amazon_product"
})

# LinkedIn profile
profile = scraper.invoke({
    "url": "https://www.linkedin.com/in/satyanadella/",
    "dataset_type": "linkedin_person_profile"
})

Parameters

Parameter Type Default Description
url str required URL to scrape
dataset_type str required Type of data to extract
zipcode str None Zipcode for location-specific data

Supported Dataset Types

Type Description
amazon_product Product details, pricing, specs
amazon_product_reviews Customer reviews and ratings
linkedin_person_profile Professional profile data
linkedin_company_profile Company information

Configuration

Zone Configuration

Bright Data uses "zones" to manage different API configurations. You can set the zone at initialization or per-request.

Setting zone at initialization

from langchain_brightdata import BrightDataSERP, BrightDataUnlocker

# SERP with custom zone
serp = BrightDataSERP(
    bright_data_api_key="your-api-key",
    zone="my_serp_zone"
)

# Unlocker with custom zone
unlocker = BrightDataUnlocker(
    bright_data_api_key="your-api-key",
    zone="my_unlocker_zone"
)

Setting zone per-request

# Override zone for a specific request
results = serp.invoke({
    "query": "AI news",
    "zone": "different_zone"
})

Default zones

Tool Default Zone
BrightDataSERP serp
BrightDataUnlocker unlocker

Note: Zone names must match the zones configured in your Bright Data dashboard.


Resources


License

MIT License - see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langchain_brightdata-0.1.8.tar.gz (12.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

langchain_brightdata-0.1.8-py3-none-any.whl (15.6 kB view details)

Uploaded Python 3

File details

Details for the file langchain_brightdata-0.1.8.tar.gz.

File metadata

  • Download URL: langchain_brightdata-0.1.8.tar.gz
  • Upload date:
  • Size: 12.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for langchain_brightdata-0.1.8.tar.gz
Algorithm Hash digest
SHA256 f048f68d7717beca3619896a49b410e4cd5b5a1c961996f69a220ba35e86af17
MD5 7dc720ed43699037f77ea867c28baeab
BLAKE2b-256 160422d9c43da6b0d1e3788436909eb65786e02f3c5e37d59a148ad353ffc935

See more details on using hashes here.

Provenance

The following attestation bundles were made for langchain_brightdata-0.1.8.tar.gz:

Publisher: publish.yml on luminati-io/langchain-brightdata

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file langchain_brightdata-0.1.8-py3-none-any.whl.

File metadata

File hashes

Hashes for langchain_brightdata-0.1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 c12510121c106117d12164de9e4265ce35faedc77bd8459f2f9027306f1f6a7f
MD5 a3d114559c1cbbaa016725cb1657419a
BLAKE2b-256 88460778e7aeac3f7e563edcf17aaf51f3dd29a7302c489d622d47613c7b6844

See more details on using hashes here.

Provenance

The following attestation bundles were made for langchain_brightdata-0.1.8-py3-none-any.whl:

Publisher: publish.yml on luminati-io/langchain-brightdata

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page