Skip to main content

LangChain integration for Bright Data web scraping and SERP APIs

Project description

langchain-brightdata

LangChain integration for Bright Data's web data APIs

PyPI version Python License: MIT Downloads

InstallationQuick StartToolsConfigurationResources


Overview

langchain-brightdata provides LangChain tools for Bright Data's web data APIs, enabling your AI agents to:

  • Search - Query search engines with geo-targeting and language customization
  • Unlock - Access geo-restricted or bot-protected websites
  • Scrape - Extract structured data from Amazon, LinkedIn, and 100+ domains

Installation

pip install langchain-brightdata

Requirements: Python 3.9+


Quick Start

1. Get your API key

Sign up at Bright Data and get your API key from the dashboard.

2. Set up authentication

import os
os.environ["BRIGHT_DATA_API_KEY"] = "your-api-key"

Or pass it directly:

from langchain_brightdata import BrightDataSERP
tool = BrightDataSERP(bright_data_api_key="your-api-key")

3. Use with LangChain agents

from langchain_brightdata import BrightDataSERP, BrightDataUnlocker, BrightDataWebScraperAPI
from langchain.agents import initialize_agent, AgentType
from langchain_openai import ChatOpenAI

# Initialize tools
tools = [
    BrightDataSERP(),
    BrightDataUnlocker(),
    BrightDataWebScraperAPI()
]

# Create agent
llm = ChatOpenAI(model="gpt-4")
agent = initialize_agent(tools, llm, agent=AgentType.OPENAI_FUNCTIONS)

# Run
agent.run("Search for the latest AI news and summarize the top result")

Tools

BrightDataSERP

Search engine results with geo-targeting and customization.

from langchain_brightdata import BrightDataSERP

serp = BrightDataSERP()

# Simple search
results = serp.invoke("latest AI research")

# Advanced search
results = serp.invoke({
    "query": "electric vehicles",
    "country": "de",
    "language": "de",
    "search_type": "news",
    "results_count": 20
})

Parameters

Parameter Type Default Description
query str required Search query
zone str "serp" Bright Data zone name
search_engine str "google" Search engine (google, bing, yahoo)
country str "us" Two-letter country code
language str "en" Two-letter language code
results_count int 10 Number of results (max 100)
search_type str None None (web), "isch" (images), "shop", "nws" (news), "jobs"
device_type str None None (desktop), "mobile", "ios", "android"
parse_results bool False Return structured JSON

BrightDataUnlocker

Access any public website, bypassing geo-restrictions and bot protection.

from langchain_brightdata import BrightDataUnlocker

unlocker = BrightDataUnlocker()

# Simple access
content = unlocker.invoke("https://example.com")

# With options
content = unlocker.invoke({
    "url": "https://example.com/restricted",
    "country": "gb",
    "data_format": "markdown"
})

Parameters

Parameter Type Default Description
url str required URL to access
zone str "unlocker" Bright Data zone name
country str None Two-letter country code
data_format str None None (HTML), "markdown", "screenshot"

BrightDataWebScraperAPI

Extract structured data from popular websites.

from langchain_brightdata import BrightDataWebScraperAPI

scraper = BrightDataWebScraperAPI()

# Amazon product
product = scraper.invoke({
    "url": "https://www.amazon.com/dp/B08L5TNJHG",
    "dataset_type": "amazon_product"
})

# LinkedIn profile
profile = scraper.invoke({
    "url": "https://www.linkedin.com/in/satyanadella/",
    "dataset_type": "linkedin_person_profile"
})

Parameters

Parameter Type Default Description
url str required URL to scrape
dataset_type str required Type of data to extract
zipcode str None Zipcode for location-specific data

Supported Dataset Types

Type Description
amazon_product Product details, pricing, specs
amazon_product_reviews Customer reviews and ratings
linkedin_person_profile Professional profile data
linkedin_company_profile Company information

Configuration

Zone Configuration

Bright Data uses "zones" to manage different API configurations. You can set the zone at initialization or per-request.

Setting zone at initialization

from langchain_brightdata import BrightDataSERP, BrightDataUnlocker

# SERP with custom zone
serp = BrightDataSERP(
    bright_data_api_key="your-api-key",
    zone="my_serp_zone"
)

# Unlocker with custom zone
unlocker = BrightDataUnlocker(
    bright_data_api_key="your-api-key",
    zone="my_unlocker_zone"
)

Setting zone per-request

# Override zone for a specific request
results = serp.invoke({
    "query": "AI news",
    "zone": "different_zone"
})

Default zones

Tool Default Zone
BrightDataSERP serp
BrightDataUnlocker unlocker

Note: Zone names must match the zones configured in your Bright Data dashboard.


Resources


License

MIT License - see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langchain_brightdata-0.1.7.tar.gz (9.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

langchain_brightdata-0.1.7-py3-none-any.whl (12.5 kB view details)

Uploaded Python 3

File details

Details for the file langchain_brightdata-0.1.7.tar.gz.

File metadata

  • Download URL: langchain_brightdata-0.1.7.tar.gz
  • Upload date:
  • Size: 9.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for langchain_brightdata-0.1.7.tar.gz
Algorithm Hash digest
SHA256 6dfb1a49438ccdeb4381424d1eb1b1070d65fa46b25ea525ee6cfa540299388e
MD5 0f754571b07cdfbcf532395e51648ca9
BLAKE2b-256 9a0ccfe356f4e1d110ebb7742f965ceb3caaf8e252896d0f845c0d2e935ac359

See more details on using hashes here.

Provenance

The following attestation bundles were made for langchain_brightdata-0.1.7.tar.gz:

Publisher: publish.yml on luminati-io/langchain-brightdata

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file langchain_brightdata-0.1.7-py3-none-any.whl.

File metadata

File hashes

Hashes for langchain_brightdata-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 d61c8a613b9d83e14dcb34a73f319929ec5576a00533df20f947ef8649637921
MD5 523a46e4a3f43ab968e1112382462786
BLAKE2b-256 7fb4692304485a3890c3d6122355155c8ae932b8b406a7e75aeffe1a1ecbdbba

See more details on using hashes here.

Provenance

The following attestation bundles were made for langchain_brightdata-0.1.7-py3-none-any.whl:

Publisher: publish.yml on luminati-io/langchain-brightdata

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page