LangChain integration for Bright Data web scraping and SERP APIs
Project description
langchain-brightdata
LangChain integration for Bright Data's web data APIs
Installation • Quick Start • Tools • Configuration • Resources
Overview
langchain-brightdata provides LangChain tools for Bright Data's web data APIs, enabling your AI agents to:
- Search - Query search engines with geo-targeting and language customization
- Unlock - Access geo-restricted or bot-protected websites
- Scrape - Extract structured data from Amazon, LinkedIn, and 100+ domains
Installation
pip install langchain-brightdata
Requirements: Python 3.9+
Quick Start
1. Get your API key
Sign up at Bright Data and get your API key from the dashboard.
2. Set up authentication
import os
os.environ["BRIGHT_DATA_API_KEY"] = "your-api-key"
Or pass it directly:
from langchain_brightdata import BrightDataSERP
tool = BrightDataSERP(bright_data_api_key="your-api-key")
3. Use with LangChain agents
from langchain_brightdata import BrightDataSERP, BrightDataUnlocker, BrightDataWebScraperAPI
from langchain.agents import initialize_agent, AgentType
from langchain_openai import ChatOpenAI
# Initialize tools
tools = [
BrightDataSERP(),
BrightDataUnlocker(),
BrightDataWebScraperAPI()
]
# Create agent
llm = ChatOpenAI(model="gpt-4")
agent = initialize_agent(tools, llm, agent=AgentType.OPENAI_FUNCTIONS)
# Run
agent.run("Search for the latest AI news and summarize the top result")
Tools
BrightDataSERP
Search engine results with geo-targeting and customization.
from langchain_brightdata import BrightDataSERP
serp = BrightDataSERP()
# Simple search
results = serp.invoke("latest AI research")
# Advanced search
results = serp.invoke({
"query": "electric vehicles",
"country": "de",
"language": "de",
"search_type": "news",
"results_count": 20
})
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
query |
str | required | Search query |
zone |
str | "serp" |
Bright Data zone name |
search_engine |
str | "google" |
Search engine (google, bing, yahoo) |
country |
str | "us" |
Two-letter country code |
language |
str | "en" |
Two-letter language code |
results_count |
int | 10 |
Number of results (max 100) |
search_type |
str | None |
None (web), "isch" (images), "shop", "nws" (news), "jobs" |
device_type |
str | None |
None (desktop), "mobile", "ios", "android" |
parse_results |
bool | False |
Return structured JSON |
BrightDataUnlocker
Access any public website, bypassing geo-restrictions and bot protection.
from langchain_brightdata import BrightDataUnlocker
unlocker = BrightDataUnlocker()
# Simple access
content = unlocker.invoke("https://example.com")
# With options
content = unlocker.invoke({
"url": "https://example.com/restricted",
"country": "gb",
"data_format": "markdown"
})
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
url |
str | required | URL to access |
zone |
str | "unlocker" |
Bright Data zone name |
country |
str | None |
Two-letter country code |
data_format |
str | None |
None (HTML), "markdown", "screenshot" |
BrightDataWebScraperAPI
Extract structured data from popular websites.
from langchain_brightdata import BrightDataWebScraperAPI
scraper = BrightDataWebScraperAPI()
# Amazon product
product = scraper.invoke({
"url": "https://www.amazon.com/dp/B08L5TNJHG",
"dataset_type": "amazon_product"
})
# LinkedIn profile
profile = scraper.invoke({
"url": "https://www.linkedin.com/in/satyanadella/",
"dataset_type": "linkedin_person_profile"
})
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
url |
str | required | URL to scrape |
dataset_type |
str | required | Type of data to extract |
zipcode |
str | None |
Zipcode for location-specific data |
Supported Dataset Types
| Type | Description |
|---|---|
amazon_product |
Product details, pricing, specs |
amazon_product_reviews |
Customer reviews and ratings |
linkedin_person_profile |
Professional profile data |
linkedin_company_profile |
Company information |
Configuration
Zone Configuration
Bright Data uses "zones" to manage different API configurations. You can set the zone at initialization or per-request.
Setting zone at initialization
from langchain_brightdata import BrightDataSERP, BrightDataUnlocker
# SERP with custom zone
serp = BrightDataSERP(
bright_data_api_key="your-api-key",
zone="my_serp_zone"
)
# Unlocker with custom zone
unlocker = BrightDataUnlocker(
bright_data_api_key="your-api-key",
zone="my_unlocker_zone"
)
Setting zone per-request
# Override zone for a specific request
results = serp.invoke({
"query": "AI news",
"zone": "different_zone"
})
Default zones
| Tool | Default Zone |
|---|---|
BrightDataSERP |
serp |
BrightDataUnlocker |
unlocker |
Note: Zone names must match the zones configured in your Bright Data dashboard.
Resources
License
MIT License - see LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file langchain_brightdata-0.1.9.tar.gz.
File metadata
- Download URL: langchain_brightdata-0.1.9.tar.gz
- Upload date:
- Size: 12.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a892eb1e415621187bfcee309688d2d63c174f1be2ffee004d3630c63dfe6e80
|
|
| MD5 |
33f139216de10b92464f18255a53b04e
|
|
| BLAKE2b-256 |
d56ee35c5d72b1f7012757a1457417c11a98014c48fcefd57addf339b20f7121
|
Provenance
The following attestation bundles were made for langchain_brightdata-0.1.9.tar.gz:
Publisher:
publish.yml on luminati-io/langchain-brightdata
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
langchain_brightdata-0.1.9.tar.gz -
Subject digest:
a892eb1e415621187bfcee309688d2d63c174f1be2ffee004d3630c63dfe6e80 - Sigstore transparency entry: 785425095
- Sigstore integration time:
-
Permalink:
luminati-io/langchain-brightdata@54d9732ff5268336300cc7f7ec11f2fb80a144d6 -
Branch / Tag:
refs/tags/v0.1.9 - Owner: https://github.com/luminati-io
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@54d9732ff5268336300cc7f7ec11f2fb80a144d6 -
Trigger Event:
push
-
Statement type:
File details
Details for the file langchain_brightdata-0.1.9-py3-none-any.whl.
File metadata
- Download URL: langchain_brightdata-0.1.9-py3-none-any.whl
- Upload date:
- Size: 15.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
98359255ab143ab2d9829f57ee9d2af8334bb6d844818d76e5d7e9befaef827a
|
|
| MD5 |
362e209c6765cc10510c646f9546f52d
|
|
| BLAKE2b-256 |
13446f030690b3603e6925ef895391ea220739f7b2c33fae3c0adf4acb617a76
|
Provenance
The following attestation bundles were made for langchain_brightdata-0.1.9-py3-none-any.whl:
Publisher:
publish.yml on luminati-io/langchain-brightdata
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
langchain_brightdata-0.1.9-py3-none-any.whl -
Subject digest:
98359255ab143ab2d9829f57ee9d2af8334bb6d844818d76e5d7e9befaef827a - Sigstore transparency entry: 785425100
- Sigstore integration time:
-
Permalink:
luminati-io/langchain-brightdata@54d9732ff5268336300cc7f7ec11f2fb80a144d6 -
Branch / Tag:
refs/tags/v0.1.9 - Owner: https://github.com/luminati-io
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@54d9732ff5268336300cc7f7ec11f2fb80a144d6 -
Trigger Event:
push
-
Statement type: