Skip to main content

LangChain integration for Bright Data web scraping and SERP APIs

Project description

🌟 langchain-brightdata

PyPI - Version License: MIT LangChain

Access powerful web data capabilities for your AI agents with Bright Data! 🚀

📋 Overview

This package provides LangChain integrations for Bright Data's suite of web data collection tools, allowing your AI agents to:

  • 🔍 Collect search engine results with geo-targeting
  • 🌐 Access websites that might be geo-restricted or protected by anti-bot systems
  • 📊 Extract structured data from popular websites like Amazon, LinkedIn, and more

Perfect for AI agents that need real-time web data!

🛠️ Installation

pip install langchain-brightdata

🔑 Setup

You'll need a Bright Data API key to use these tools. Set it as an environment variable:

import os
os.environ["BRIGHT_DATA_API_KEY"] = "your-api-key"

Or pass it directly when initializing tools:

from langchain_brightdata import BrightDataSERP
tool = BrightDataSERP(bright_data_api_key="your-api-key")

🧰 Available Tools

🔍 BrightDataSERP

Perform search engine queries with customizable geo-targeting, device type, and language settings.

from langchain_brightdata import BrightDataSERP

# Basic usage
serp_tool = BrightDataSERP(bright_data_api_key="your-api-key")
results = serp_tool.invoke("latest AI research papers")

# Advanced usage with parameters
results = serp_tool.invoke({
    "query": "best electric vehicles",
    "country": "de",  # Get results as if searching from Germany
    "language": "de",  # Get results in German
    "search_type": "shop",  # Get shopping results
    "device_type": "mobile",  # Simulate a mobile device
    "results_count": 15
})

🎛️ Customization Options

Parameter Type Description
query str The search query to perform
search_engine str Search engine to use (default: "google")
country str Two-letter country code for localized results (default: "us")
language str Two-letter language code (default: "en")
results_count int Number of results to return (default: 10)
search_type str Type of search: None (web), "isch" (images), "shop", "nws" (news), "jobs"
device_type str Device type: None (desktop), "mobile", "ios", "android"
parse_results bool Whether to return structured JSON (default: False)

🌐 BrightDataUnlocker

Access ANY public website that might be geo-restricted or protected by anti-bot systems.

from langchain_brightdata import BrightDataUnlocker

# Basic usage
unlocker_tool = BrightDataUnlocker(bright_data_api_key="your-api-key")
result = unlocker_tool.invoke("https://example.com")

# Advanced usage with parameters
result = unlocker_tool.invoke({
    "url": "https://example.com/region-restricted-content",
    "country": "gb",  # Access as if from Great Britain
    "data_format": "markdown",  # Get content in markdown format
    "zone": "unlocker"  # Use the unlocker zone
})

🎛️ Customization Options

Parameter Type Description
url str The URL to access
format str Format of the response content (default: "raw")
country str Two-letter country code for geo-specific access (e.g., "us", "gb")
zone str Bright Data zone to use (default: "unblocker")
data_format str Output format: None (HTML), "markdown", or "screenshot"

📊 BrightDataWebScraperAPI

Extract structured data from 100+ popular domains, including Amazon, LinkedIn, and more.

from langchain_brightdata import BrightDataWebScraperAPI

# Initialize the tool
scraper_tool = BrightDataWebScraperAPI(bright_data_api_key="your-api-key")

# Extract Amazon product data
results = scraper_tool.invoke({
    "url": "https://www.amazon.com/dp/B08L5TNJHG",
    "dataset_type": "amazon_product"
})

# Extract LinkedIn profile data
linkedin_results = scraper_tool.invoke({
    "url": "https://www.linkedin.com/in/satyanadella/",
    "dataset_type": "linkedin_person_profile"
})

🎛️ Customization Options

Parameter Type Description
url str The URL to extract data from
dataset_type str Type of dataset to use (e.g., "amazon_product")
zipcode str Optional zipcode for location-specific data

📂 Available Dataset Types

Dataset Type Description
amazon_product Extract detailed Amazon product data
amazon_product_reviews Extract Amazon product reviews
linkedin_person_profile Extract LinkedIn person profile data
linkedin_company_profile Extract LinkedIn company profile data

📚 Additional Resources

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langchain_brightdata-0.1.4.tar.gz (9.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

langchain_brightdata-0.1.4-py3-none-any.whl (12.2 kB view details)

Uploaded Python 3

File details

Details for the file langchain_brightdata-0.1.4.tar.gz.

File metadata

  • Download URL: langchain_brightdata-0.1.4.tar.gz
  • Upload date:
  • Size: 9.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for langchain_brightdata-0.1.4.tar.gz
Algorithm Hash digest
SHA256 6410b1e2c90a584fb73861fcce0d838ac918444f8ca7fce004fc7efd58fc706d
MD5 619a47216c8fd3e836f866f3e9f55f3e
BLAKE2b-256 eeb803cb9a831e87153d770b68f47d2ff514dc5dd694a91fc48d70834ea99329

See more details on using hashes here.

Provenance

The following attestation bundles were made for langchain_brightdata-0.1.4.tar.gz:

Publisher: publish.yml on luminati-io/langchain-brightdata

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file langchain_brightdata-0.1.4-py3-none-any.whl.

File metadata

File hashes

Hashes for langchain_brightdata-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 8ba61e5b9b5a5cc18147cbf4b7c1ebe58649b8d16af26767f429ecf53efad4f5
MD5 9bcdd6bb585af8785e88490adb3ae95f
BLAKE2b-256 7da952e1306109ca04adffb98fb8f66e861720e53e462e098c2702e52e8e097f

See more details on using hashes here.

Provenance

The following attestation bundles were made for langchain_brightdata-0.1.4-py3-none-any.whl:

Publisher: publish.yml on luminati-io/langchain-brightdata

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page