LangChain integration for Bright Data web scraping and SERP APIs
Project description
langchain-brightdata
LangChain integration for Bright Data's web data APIs
Installation • Quick Start • Tools • Configuration • Resources
Overview
langchain-brightdata provides LangChain tools for Bright Data's web data APIs, enabling your AI agents to:
- Search - Query search engines with geo-targeting and language customization
- Unlock - Access geo-restricted or bot-protected websites
- Scrape - Extract structured data from Amazon, LinkedIn, and 100+ domains
Installation
pip install langchain-brightdata
Requirements: Python 3.9+
Quick Start
1. Get your API key
Sign up at Bright Data and get your API key from the dashboard.
2. Set up authentication
import os
os.environ["BRIGHT_DATA_API_KEY"] = "your-api-key"
Or pass it directly:
from langchain_brightdata import BrightDataSERP
tool = BrightDataSERP(bright_data_api_key="your-api-key")
3. Use with LangChain agents
from langchain_brightdata import BrightDataSERP, BrightDataUnlocker, BrightDataWebScraperAPI
from langchain.agents import initialize_agent, AgentType
from langchain_openai import ChatOpenAI
# Initialize tools
tools = [
BrightDataSERP(),
BrightDataUnlocker(),
BrightDataWebScraperAPI()
]
# Create agent
llm = ChatOpenAI(model="gpt-4")
agent = initialize_agent(tools, llm, agent=AgentType.OPENAI_FUNCTIONS)
# Run
agent.run("Search for the latest AI news and summarize the top result")
Tools
BrightDataSERP
Search engine results with geo-targeting and customization.
from langchain_brightdata import BrightDataSERP
serp = BrightDataSERP()
# Simple search
results = serp.invoke("latest AI research")
# Advanced search
results = serp.invoke({
"query": "electric vehicles",
"country": "de",
"language": "de",
"search_type": "news",
"results_count": 20
})
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
query |
str | required | Search query |
zone |
str | "serp" |
Bright Data zone name |
search_engine |
str | "google" |
Search engine (google, bing, yahoo) |
country |
str | "us" |
Two-letter country code |
language |
str | "en" |
Two-letter language code |
results_count |
int | 10 |
Number of results (max 100) |
search_type |
str | None |
None (web), "isch" (images), "shop", "nws" (news), "jobs" |
device_type |
str | None |
None (desktop), "mobile", "ios", "android" |
parse_results |
bool | False |
Return structured JSON |
BrightDataUnlocker
Access any public website, bypassing geo-restrictions and bot protection.
from langchain_brightdata import BrightDataUnlocker
unlocker = BrightDataUnlocker()
# Simple access
content = unlocker.invoke("https://example.com")
# With options
content = unlocker.invoke({
"url": "https://example.com/restricted",
"country": "gb",
"data_format": "markdown"
})
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
url |
str | required | URL to access |
zone |
str | "unlocker" |
Bright Data zone name |
country |
str | None |
Two-letter country code |
data_format |
str | None |
None (HTML), "markdown", "screenshot" |
BrightDataWebScraperAPI
Extract structured data from popular websites.
from langchain_brightdata import BrightDataWebScraperAPI
scraper = BrightDataWebScraperAPI()
# Amazon product
product = scraper.invoke({
"url": "https://www.amazon.com/dp/B08L5TNJHG",
"dataset_type": "amazon_product"
})
# LinkedIn profile
profile = scraper.invoke({
"url": "https://www.linkedin.com/in/satyanadella/",
"dataset_type": "linkedin_person_profile"
})
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
url |
str | required | URL to scrape |
dataset_type |
str | required | Type of data to extract |
zipcode |
str | None |
Zipcode for location-specific data |
Supported Dataset Types (44 Datasets)
E-Commerce (10 datasets)
| Type | Description | Required Inputs |
|---|---|---|
amazon_product |
Product details, pricing, specs | url (with /dp/) |
amazon_product_reviews |
Customer reviews and ratings | url (with /dp/) |
amazon_product_search |
Search results from Amazon | keyword, url |
walmart_product |
Walmart product data | url (with /ip/) |
walmart_seller |
Walmart seller information | url |
ebay_product |
eBay product data | url |
homedepot_products |
Home Depot product data | url |
zara_products |
Zara product data | url |
etsy_products |
Etsy product data | url |
bestbuy_products |
Best Buy product data | url |
LinkedIn (5 datasets)
| Type | Description | Required Inputs |
|---|---|---|
linkedin_person_profile |
Professional profile data | url |
linkedin_company_profile |
Company information | url |
linkedin_job_listings |
Job listing details | url |
linkedin_posts |
Post content and engagement | url |
linkedin_people_search |
Search for people | url, first_name, last_name |
Business Intelligence (2 datasets)
| Type | Description | Required Inputs |
|---|---|---|
crunchbase_company |
Company funding, investors, metrics | url |
zoominfo_company_profile |
B2B company intelligence | url |
Instagram (4 datasets)
| Type | Description | Required Inputs |
|---|---|---|
instagram_profiles |
Profile data and stats | url |
instagram_posts |
Post content and engagement | url |
instagram_reels |
Reel content and metrics | url |
instagram_comments |
Comments on posts | url |
Facebook (4 datasets)
| Type | Description | Required Inputs |
|---|---|---|
facebook_posts |
Post content and engagement | url |
facebook_marketplace_listings |
Marketplace listing data | url |
facebook_company_reviews |
Company reviews | url, num_of_reviews |
facebook_events |
Event details | url |
TikTok (4 datasets)
| Type | Description | Required Inputs |
|---|---|---|
tiktok_profiles |
Profile data and stats | url |
tiktok_posts |
Video content and metrics | url |
tiktok_shop |
Shop product data | url |
tiktok_comments |
Comments on videos | url |
YouTube (3 datasets)
| Type | Description | Required Inputs |
|---|---|---|
youtube_profiles |
Channel profile data | url |
youtube_videos |
Video content and metrics | url |
youtube_comments |
Comments on videos | url, num_of_comments (default: 10) |
Google (3 datasets)
| Type | Description | Required Inputs |
|---|---|---|
google_maps_reviews |
Business reviews from Maps | url, days_limit (default: 3) |
google_shopping |
Shopping product data | url |
google_play_store |
App store data | url |
Other Platforms (9 datasets)
| Type | Description | Required Inputs |
|---|---|---|
apple_app_store |
iOS app data | url |
x_posts |
X (Twitter) post data | url |
reddit_posts |
Reddit post data | url |
github_repository_file |
GitHub file content | url |
yahoo_finance_business |
Financial business data | url |
reuter_news |
News article data | url |
zillow_properties_listing |
Real estate listing data | url |
booking_hotel_listings |
Hotel listing data | url |
Configuration
Zone Configuration
Bright Data uses "zones" to manage different API configurations. You can set the zone at initialization or per-request.
Setting zone at initialization
from langchain_brightdata import BrightDataSERP, BrightDataUnlocker
# SERP with custom zone
serp = BrightDataSERP(
bright_data_api_key="your-api-key",
zone="my_serp_zone"
)
# Unlocker with custom zone
unlocker = BrightDataUnlocker(
bright_data_api_key="your-api-key",
zone="my_unlocker_zone"
)
Setting zone per-request
# Override zone for a specific request
results = serp.invoke({
"query": "AI news",
"zone": "different_zone"
})
Default zones
| Tool | Default Zone |
|---|---|
BrightDataSERP |
serp |
BrightDataUnlocker |
unlocker |
Note: Zone names must match the zones configured in your Bright Data dashboard.
Resources
License
MIT License - see LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file langchain_brightdata-0.2.0.tar.gz.
File metadata
- Download URL: langchain_brightdata-0.2.0.tar.gz
- Upload date:
- Size: 13.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
baec311fda3eeed6a8e5c83adf7a61616c2ee64b81aea70a12b5f3f82b3e15a4
|
|
| MD5 |
c2111a0228c4e743bb1d8fc456ab0ebd
|
|
| BLAKE2b-256 |
fcc401de93a196e83ed348f80fa55723d286f94d9a18a0237ec499f241d7b0e6
|
Provenance
The following attestation bundles were made for langchain_brightdata-0.2.0.tar.gz:
Publisher:
publish.yml on luminati-io/langchain-brightdata
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
langchain_brightdata-0.2.0.tar.gz -
Subject digest:
baec311fda3eeed6a8e5c83adf7a61616c2ee64b81aea70a12b5f3f82b3e15a4 - Sigstore transparency entry: 786987063
- Sigstore integration time:
-
Permalink:
luminati-io/langchain-brightdata@8ce8df93158dabb0ae9fd39f7695236865216a4c -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/luminati-io
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@8ce8df93158dabb0ae9fd39f7695236865216a4c -
Trigger Event:
push
-
Statement type:
File details
Details for the file langchain_brightdata-0.2.0-py3-none-any.whl.
File metadata
- Download URL: langchain_brightdata-0.2.0-py3-none-any.whl
- Upload date:
- Size: 16.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a4db50e9a52e723a32141d577e6ad8070b6bd33ffde16e32933e513ce12fbb92
|
|
| MD5 |
c2f932d652f41bd20418f3dfde1ddfac
|
|
| BLAKE2b-256 |
4ba4162786e2ff1eeef819cc217f38d2e9f176c06c09cd4b7a9d6ce6649697d5
|
Provenance
The following attestation bundles were made for langchain_brightdata-0.2.0-py3-none-any.whl:
Publisher:
publish.yml on luminati-io/langchain-brightdata
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
langchain_brightdata-0.2.0-py3-none-any.whl -
Subject digest:
a4db50e9a52e723a32141d577e6ad8070b6bd33ffde16e32933e513ce12fbb92 - Sigstore transparency entry: 786987066
- Sigstore integration time:
-
Permalink:
luminati-io/langchain-brightdata@8ce8df93158dabb0ae9fd39f7695236865216a4c -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/luminati-io
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@8ce8df93158dabb0ae9fd39f7695236865216a4c -
Trigger Event:
push
-
Statement type: