Skip to main content

LangChain integration for Bright Data web scraping and SERP APIs

Project description

langchain-brightdata

LangChain integration for Bright Data's web data APIs

PyPI version Python License: MIT Downloads

InstallationQuick StartToolsConfigurationResources


Overview

langchain-brightdata provides LangChain tools for Bright Data's web data APIs, enabling your AI agents to:

  • Search - Query search engines with geo-targeting and language customization
  • Unlock - Access geo-restricted or bot-protected websites
  • Scrape - Extract structured data from Amazon, LinkedIn, and 100+ domains

Installation

pip install langchain-brightdata

Requirements: Python 3.9+


Quick Start

1. Get your API key

Sign up at Bright Data and get your API key from the dashboard.

2. Set up authentication

import os
os.environ["BRIGHT_DATA_API_KEY"] = "your-api-key"

Or pass it directly:

from langchain_brightdata import BrightDataSERP
tool = BrightDataSERP(bright_data_api_key="your-api-key")

3. Use with LangChain agents

from langchain_brightdata import BrightDataSERP, BrightDataUnlocker, BrightDataWebScraperAPI
from langchain.agents import initialize_agent, AgentType
from langchain_openai import ChatOpenAI

# Initialize tools
tools = [
    BrightDataSERP(),
    BrightDataUnlocker(),
    BrightDataWebScraperAPI()
]

# Create agent
llm = ChatOpenAI(model="gpt-4")
agent = initialize_agent(tools, llm, agent=AgentType.OPENAI_FUNCTIONS)

# Run
agent.run("Search for the latest AI news and summarize the top result")

Tools

BrightDataSERP

Search engine results with geo-targeting and customization.

from langchain_brightdata import BrightDataSERP

serp = BrightDataSERP()

# Simple search
results = serp.invoke("latest AI research")

# Advanced search
results = serp.invoke({
    "query": "electric vehicles",
    "country": "de",
    "language": "de",
    "search_type": "news",
    "results_count": 20
})

Parameters

Parameter Type Default Description
query str required Search query
zone str "serp" Bright Data zone name
search_engine str "google" Search engine (google, bing, yahoo)
country str "us" Two-letter country code
language str "en" Two-letter language code
results_count int 10 Number of results (max 100)
search_type str None None (web), "isch" (images), "shop", "nws" (news), "jobs"
device_type str None None (desktop), "mobile", "ios", "android"
parse_results bool False Return structured JSON

BrightDataUnlocker

Access any public website, bypassing geo-restrictions and bot protection.

from langchain_brightdata import BrightDataUnlocker

unlocker = BrightDataUnlocker()

# Simple access
content = unlocker.invoke("https://example.com")

# With options
content = unlocker.invoke({
    "url": "https://example.com/restricted",
    "country": "gb",
    "data_format": "markdown"
})

Parameters

Parameter Type Default Description
url str required URL to access
zone str "unlocker" Bright Data zone name
country str None Two-letter country code
data_format str None None (HTML), "markdown", "screenshot"

BrightDataWebScraperAPI

Extract structured data from popular websites.

from langchain_brightdata import BrightDataWebScraperAPI

scraper = BrightDataWebScraperAPI()

# Amazon product
product = scraper.invoke({
    "url": "https://www.amazon.com/dp/B08L5TNJHG",
    "dataset_type": "amazon_product"
})

# LinkedIn profile
profile = scraper.invoke({
    "url": "https://www.linkedin.com/in/satyanadella/",
    "dataset_type": "linkedin_person_profile"
})

Parameters

Parameter Type Default Description
url str required URL to scrape
dataset_type str required Type of data to extract
zipcode str None Zipcode for location-specific data

Supported Dataset Types (44 Datasets)

E-Commerce (10 datasets)
Type Description Required Inputs
amazon_product Product details, pricing, specs url (with /dp/)
amazon_product_reviews Customer reviews and ratings url (with /dp/)
amazon_product_search Search results from Amazon keyword, url
walmart_product Walmart product data url (with /ip/)
walmart_seller Walmart seller information url
ebay_product eBay product data url
homedepot_products Home Depot product data url
zara_products Zara product data url
etsy_products Etsy product data url
bestbuy_products Best Buy product data url
LinkedIn (5 datasets)
Type Description Required Inputs
linkedin_person_profile Professional profile data url
linkedin_company_profile Company information url
linkedin_job_listings Job listing details url
linkedin_posts Post content and engagement url
linkedin_people_search Search for people url, first_name, last_name
Business Intelligence (2 datasets)
Type Description Required Inputs
crunchbase_company Company funding, investors, metrics url
zoominfo_company_profile B2B company intelligence url
Instagram (4 datasets)
Type Description Required Inputs
instagram_profiles Profile data and stats url
instagram_posts Post content and engagement url
instagram_reels Reel content and metrics url
instagram_comments Comments on posts url
Facebook (4 datasets)
Type Description Required Inputs
facebook_posts Post content and engagement url
facebook_marketplace_listings Marketplace listing data url
facebook_company_reviews Company reviews url, num_of_reviews
facebook_events Event details url
TikTok (4 datasets)
Type Description Required Inputs
tiktok_profiles Profile data and stats url
tiktok_posts Video content and metrics url
tiktok_shop Shop product data url
tiktok_comments Comments on videos url
YouTube (3 datasets)
Type Description Required Inputs
youtube_profiles Channel profile data url
youtube_videos Video content and metrics url
youtube_comments Comments on videos url, num_of_comments (default: 10)
Google (3 datasets)
Type Description Required Inputs
google_maps_reviews Business reviews from Maps url, days_limit (default: 3)
google_shopping Shopping product data url
google_play_store App store data url
Other Platforms (9 datasets)
Type Description Required Inputs
apple_app_store iOS app data url
x_posts X (Twitter) post data url
reddit_posts Reddit post data url
github_repository_file GitHub file content url
yahoo_finance_business Financial business data url
reuter_news News article data url
zillow_properties_listing Real estate listing data url
booking_hotel_listings Hotel listing data url

Configuration

Zone Configuration

Bright Data uses "zones" to manage different API configurations. You can set the zone at initialization or per-request.

Setting zone at initialization

from langchain_brightdata import BrightDataSERP, BrightDataUnlocker

# SERP with custom zone
serp = BrightDataSERP(
    bright_data_api_key="your-api-key",
    zone="my_serp_zone"
)

# Unlocker with custom zone
unlocker = BrightDataUnlocker(
    bright_data_api_key="your-api-key",
    zone="my_unlocker_zone"
)

Setting zone per-request

# Override zone for a specific request
results = serp.invoke({
    "query": "AI news",
    "zone": "different_zone"
})

Default zones

Tool Default Zone
BrightDataSERP serp
BrightDataUnlocker unlocker

Note: Zone names must match the zones configured in your Bright Data dashboard.


Resources


License

MIT License - see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langchain_brightdata-0.2.0.tar.gz (13.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

langchain_brightdata-0.2.0-py3-none-any.whl (16.4 kB view details)

Uploaded Python 3

File details

Details for the file langchain_brightdata-0.2.0.tar.gz.

File metadata

  • Download URL: langchain_brightdata-0.2.0.tar.gz
  • Upload date:
  • Size: 13.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for langchain_brightdata-0.2.0.tar.gz
Algorithm Hash digest
SHA256 baec311fda3eeed6a8e5c83adf7a61616c2ee64b81aea70a12b5f3f82b3e15a4
MD5 c2111a0228c4e743bb1d8fc456ab0ebd
BLAKE2b-256 fcc401de93a196e83ed348f80fa55723d286f94d9a18a0237ec499f241d7b0e6

See more details on using hashes here.

Provenance

The following attestation bundles were made for langchain_brightdata-0.2.0.tar.gz:

Publisher: publish.yml on luminati-io/langchain-brightdata

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file langchain_brightdata-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for langchain_brightdata-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a4db50e9a52e723a32141d577e6ad8070b6bd33ffde16e32933e513ce12fbb92
MD5 c2f932d652f41bd20418f3dfde1ddfac
BLAKE2b-256 4ba4162786e2ff1eeef819cc217f38d2e9f176c06c09cd4b7a9d6ce6649697d5

See more details on using hashes here.

Provenance

The following attestation bundles were made for langchain_brightdata-0.2.0-py3-none-any.whl:

Publisher: publish.yml on luminati-io/langchain-brightdata

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page