Skip to main content

LangChain integration for Bright Data web scraping and SERP APIs

Project description

🌟 langchain-brightdata

PyPI - Version License: MIT LangChain

Access powerful web data capabilities for your AI agents with Bright Data! 🚀

📋 Overview

This package provides LangChain integrations for Bright Data's suite of web data collection tools, allowing your AI agents to:

  • 🔍 Collect search engine results with geo-targeting
  • 🌐 Access websites that might be geo-restricted or protected by anti-bot systems
  • 📊 Extract structured data from popular websites like Amazon, LinkedIn, and more

Perfect for AI agents that need real-time web data!

🛠️ Installation

pip install langchain-brightdata

🔑 Setup

You'll need a Bright Data API key to use these tools. Set it as an environment variable:

import os
os.environ["BRIGHT_DATA_API_KEY"] = "your-api-key"

Or pass it directly when initializing tools:

from langchain_brightdata import BrightDataSERP
tool = BrightDataSERP(bright_data_api_key="your-api-key")

🧰 Available Tools

🔍 BrightDataSERP

Perform search engine queries with customizable geo-targeting, device type, and language settings.

from langchain_brightdata import BrightDataSERP

# Basic usage
serp_tool = BrightDataSERP(bright_data_api_key="your-api-key")
results = serp_tool.invoke("latest AI research papers")

# Advanced usage with parameters
results = serp_tool.invoke({
    "query": "best electric vehicles",
    "country": "de",  # Get results as if searching from Germany
    "language": "de",  # Get results in German
    "search_type": "shop",  # Get shopping results
    "device_type": "mobile",  # Simulate a mobile device
    "results_count": 15
})

🎛️ Customization Options

Parameter Type Description
query str The search query to perform
search_engine str Search engine to use (default: "google")
country str Two-letter country code for localized results (default: "us")
language str Two-letter language code (default: "en")
results_count int Number of results to return (default: 10)
search_type str Type of search: None (web), "isch" (images), "shop", "nws" (news), "jobs"
device_type str Device type: None (desktop), "mobile", "ios", "android"
parse_results bool Whether to return structured JSON (default: False)

🌐 BrightDataUnlocker

Access ANY public website that might be geo-restricted or protected by anti-bot systems.

from langchain_brightdata import BrightDataUnlocker

# Basic usage
unlocker_tool = BrightDataUnlocker(bright_data_api_key="your-api-key")
result = unlocker_tool.invoke("https://example.com")

# Advanced usage with parameters
result = unlocker_tool.invoke({
    "url": "https://example.com/region-restricted-content",
    "country": "gb",  # Access as if from Great Britain
    "data_format": "markdown",  # Get content in markdown format
    "zone": "unlocker"  # Use the unlocker zone
})

🎛️ Customization Options

Parameter Type Description
url str The URL to access
format str Format of the response content (default: "raw")
country str Two-letter country code for geo-specific access (e.g., "us", "gb")
zone str Bright Data zone to use (default: "unblocker")
data_format str Output format: None (HTML), "markdown", or "screenshot"

📊 BrightDataWebScraperAPI

Extract structured data from 100+ popular domains, including Amazon, LinkedIn, and more.

from langchain_brightdata import BrightDataWebScraperAPI

# Initialize the tool
scraper_tool = BrightDataWebScraperAPI(bright_data_api_key="your-api-key")

# Extract Amazon product data
results = scraper_tool.invoke({
    "url": "https://www.amazon.com/dp/B08L5TNJHG",
    "dataset_type": "amazon_product"
})

# Extract LinkedIn profile data
linkedin_results = scraper_tool.invoke({
    "url": "https://www.linkedin.com/in/satyanadella/",
    "dataset_type": "linkedin_person_profile"
})

🎛️ Customization Options

Parameter Type Description
url str The URL to extract data from
dataset_type str Type of dataset to use (e.g., "amazon_product")
zipcode str Optional zipcode for location-specific data

📂 Available Dataset Types

Dataset Type Description
amazon_product Extract detailed Amazon product data
amazon_product_reviews Extract Amazon product reviews
linkedin_person_profile Extract LinkedIn person profile data
linkedin_company_profile Extract LinkedIn company profile data

📚 Additional Resources

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langchain_brightdata-0.1.5.tar.gz (9.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

langchain_brightdata-0.1.5-py3-none-any.whl (12.2 kB view details)

Uploaded Python 3

File details

Details for the file langchain_brightdata-0.1.5.tar.gz.

File metadata

  • Download URL: langchain_brightdata-0.1.5.tar.gz
  • Upload date:
  • Size: 9.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for langchain_brightdata-0.1.5.tar.gz
Algorithm Hash digest
SHA256 9d2f655a450b18679ad73ba50705f01ea89ad221d1b8272d3b19fbf4b3c0bf1a
MD5 fad73a71c2e876bab5ecd2d984c37e57
BLAKE2b-256 8610949612f60ff142cae530ca5c7b1bd8195d9fea7192cdb980aa01d953f819

See more details on using hashes here.

Provenance

The following attestation bundles were made for langchain_brightdata-0.1.5.tar.gz:

Publisher: publish.yml on luminati-io/langchain-brightdata

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file langchain_brightdata-0.1.5-py3-none-any.whl.

File metadata

File hashes

Hashes for langchain_brightdata-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 705264cdae843839c967f3d600e7e79d2efc05436599feeba75615051e47511a
MD5 1e70c91d931b6fdfc3c21d3227bd0c86
BLAKE2b-256 8f67d55640183c6860f0f378a4e35e2eb2d5a9188c2d23fbc8e9b1e1f13e9c41

See more details on using hashes here.

Provenance

The following attestation bundles were made for langchain_brightdata-0.1.5-py3-none-any.whl:

Publisher: publish.yml on luminati-io/langchain-brightdata

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page