LangChain integration for Bright Data web scraping and SERP APIs
Project description
🌟 langchain-brightdata
Access powerful web data capabilities for your AI agents with Bright Data! 🚀
📋 Overview
This package provides LangChain integrations for Bright Data's suite of web data collection tools, allowing your AI agents to:
- 🔍 Collect search engine results with geo-targeting
- 🌐 Access websites that might be geo-restricted or protected by anti-bot systems
- 📊 Extract structured data from popular websites like Amazon, LinkedIn, and more
Perfect for AI agents that need real-time web data!
🛠️ Installation
pip install langchain-brightdata
🔑 Setup
You'll need a Bright Data API key to use these tools. Set it as an environment variable:
import os
os.environ["BRIGHT_DATA_API_KEY"] = "your-api-key"
Or pass it directly when initializing tools:
from langchain_brightdata import BrightDataSERP
tool = BrightDataSERP(bright_data_api_key="your-api-key")
🧰 Available Tools
🔍 BrightDataSERP
Perform search engine queries with customizable geo-targeting, device type, and language settings.
from langchain_brightdata import BrightDataSERP
# Basic usage
serp_tool = BrightDataSERP(bright_data_api_key="your-api-key")
results = serp_tool.invoke("latest AI research papers")
# Advanced usage with parameters
results = serp_tool.invoke({
"query": "best electric vehicles",
"country": "de", # Get results as if searching from Germany
"language": "de", # Get results in German
"search_type": "shop", # Get shopping results
"device_type": "mobile", # Simulate a mobile device
"results_count": 15
})
🎛️ Customization Options
| Parameter | Type | Description |
|---|---|---|
query |
str | The search query to perform |
search_engine |
str | Search engine to use (default: "google") |
country |
str | Two-letter country code for localized results (default: "us") |
language |
str | Two-letter language code (default: "en") |
results_count |
int | Number of results to return (default: 10) |
search_type |
str | Type of search: None (web), "isch" (images), "shop", "nws" (news), "jobs" |
device_type |
str | Device type: None (desktop), "mobile", "ios", "android" |
parse_results |
bool | Whether to return structured JSON (default: False) |
🌐 BrightDataUnlocker
Access ANY public website that might be geo-restricted or protected by anti-bot systems.
from langchain_brightdata import BrightDataUnlocker
# Basic usage
unlocker_tool = BrightDataUnlocker(bright_data_api_key="your-api-key")
result = unlocker_tool.invoke("https://example.com")
# Advanced usage with parameters
result = unlocker_tool.invoke({
"url": "https://example.com/region-restricted-content",
"country": "gb", # Access as if from Great Britain
"data_format": "markdown", # Get content in markdown format
"zone": "unlocker" # Use the unlocker zone
})
🎛️ Customization Options
| Parameter | Type | Description |
|---|---|---|
url |
str | The URL to access |
format |
str | Format of the response content (default: "raw") |
country |
str | Two-letter country code for geo-specific access (e.g., "us", "gb") |
zone |
str | Bright Data zone to use (default: "unblocker") |
data_format |
str | Output format: None (HTML), "markdown", or "screenshot" |
📊 BrightDataWebScraperAPI
Extract structured data from 100+ popular domains, including Amazon, LinkedIn, and more.
from langchain_brightdata import BrightDataWebScraperAPI
# Initialize the tool
scraper_tool = BrightDataWebScraperAPI(bright_data_api_key="your-api-key")
# Extract Amazon product data
results = scraper_tool.invoke({
"url": "https://www.amazon.com/dp/B08L5TNJHG",
"dataset_type": "amazon_product"
})
# Extract LinkedIn profile data
linkedin_results = scraper_tool.invoke({
"url": "https://www.linkedin.com/in/satyanadella/",
"dataset_type": "linkedin_person_profile"
})
🎛️ Customization Options
| Parameter | Type | Description |
|---|---|---|
url |
str | The URL to extract data from |
dataset_type |
str | Type of dataset to use (e.g., "amazon_product") |
zipcode |
str | Optional zipcode for location-specific data |
📂 Available Dataset Types
| Dataset Type | Description |
|---|---|
amazon_product |
Extract detailed Amazon product data |
amazon_product_reviews |
Extract Amazon product reviews |
linkedin_person_profile |
Extract LinkedIn person profile data |
linkedin_company_profile |
Extract LinkedIn company profile data |
📚 Additional Resources
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file langchain_brightdata-0.1.5.tar.gz.
File metadata
- Download URL: langchain_brightdata-0.1.5.tar.gz
- Upload date:
- Size: 9.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9d2f655a450b18679ad73ba50705f01ea89ad221d1b8272d3b19fbf4b3c0bf1a
|
|
| MD5 |
fad73a71c2e876bab5ecd2d984c37e57
|
|
| BLAKE2b-256 |
8610949612f60ff142cae530ca5c7b1bd8195d9fea7192cdb980aa01d953f819
|
Provenance
The following attestation bundles were made for langchain_brightdata-0.1.5.tar.gz:
Publisher:
publish.yml on luminati-io/langchain-brightdata
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
langchain_brightdata-0.1.5.tar.gz -
Subject digest:
9d2f655a450b18679ad73ba50705f01ea89ad221d1b8272d3b19fbf4b3c0bf1a - Sigstore transparency entry: 785223636
- Sigstore integration time:
-
Permalink:
luminati-io/langchain-brightdata@9f852eaaeaab6e9633c547ab46ccae6416d638c0 -
Branch / Tag:
refs/tags/v0.1.5 - Owner: https://github.com/luminati-io
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@9f852eaaeaab6e9633c547ab46ccae6416d638c0 -
Trigger Event:
push
-
Statement type:
File details
Details for the file langchain_brightdata-0.1.5-py3-none-any.whl.
File metadata
- Download URL: langchain_brightdata-0.1.5-py3-none-any.whl
- Upload date:
- Size: 12.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
705264cdae843839c967f3d600e7e79d2efc05436599feeba75615051e47511a
|
|
| MD5 |
1e70c91d931b6fdfc3c21d3227bd0c86
|
|
| BLAKE2b-256 |
8f67d55640183c6860f0f378a4e35e2eb2d5a9188c2d23fbc8e9b1e1f13e9c41
|
Provenance
The following attestation bundles were made for langchain_brightdata-0.1.5-py3-none-any.whl:
Publisher:
publish.yml on luminati-io/langchain-brightdata
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
langchain_brightdata-0.1.5-py3-none-any.whl -
Subject digest:
705264cdae843839c967f3d600e7e79d2efc05436599feeba75615051e47511a - Sigstore transparency entry: 785223642
- Sigstore integration time:
-
Permalink:
luminati-io/langchain-brightdata@9f852eaaeaab6e9633c547ab46ccae6416d638c0 -
Branch / Tag:
refs/tags/v0.1.5 - Owner: https://github.com/luminati-io
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@9f852eaaeaab6e9633c547ab46ccae6416d638c0 -
Trigger Event:
push
-
Statement type: