Skip to main content

Bright Data tools for LangChain

Project description

🌟 langchain-brightdata

PyPI - Version License: MIT LangChain

Access powerful web data capabilities for your AI agents with Bright Data! 🚀

📋 Overview

This package provides LangChain integrations for Bright Data's suite of web data collection tools, allowing your AI agents to:

  • 🔍 Collect search engine results with geo-targeting
  • 🌐 Access websites that might be geo-restricted or protected by anti-bot systems
  • 📊 Extract structured data from popular websites like Amazon, LinkedIn, and more

Perfect for AI agents that need real-time web data!

🛠️ Installation

pip install langchain-brightdata

🔑 Setup

You'll need a Bright Data API key to use these tools. Set it as an environment variable:

import os
os.environ["BRIGHT_DATA_API_KEY"] = "your-api-key"

Or pass it directly when initializing tools:

from langchain_brightdata import BrightDataSERP
tool = BrightDataSERP(bright_data_api_key="your-api-key")

🧰 Available Tools

🔍 BrightDataSERP

Perform search engine queries with customizable geo-targeting, device type, and language settings.

from langchain_brightdata import BrightDataSERP

# Basic usage
serp_tool = BrightDataSERP(bright_data_api_key="your-api-key")
results = serp_tool.invoke("latest AI research papers")

# Advanced usage with parameters
results = serp_tool.invoke({
    "query": "best electric vehicles",
    "country": "de",  # Get results as if searching from Germany
    "language": "de",  # Get results in German
    "search_type": "shop",  # Get shopping results
    "device_type": "mobile",  # Simulate a mobile device
    "results_count": 15
})

🎛️ Customization Options

Parameter Type Description
query str The search query to perform
search_engine str Search engine to use (default: "google")
country str Two-letter country code for localized results (default: "us")
language str Two-letter language code (default: "en")
results_count int Number of results to return (default: 10)
search_type str Type of search: None (web), "isch" (images), "shop", "nws" (news), "jobs"
device_type str Device type: None (desktop), "mobile", "ios", "android"
parse_results bool Whether to return structured JSON (default: False)

🌐 BrightDataUnlocker

Access ANY public website that might be geo-restricted or protected by anti-bot systems.

from langchain_brightdata import BrightDataUnlocker

# Basic usage
unlocker_tool = BrightDataUnlocker(bright_data_api_key="your-api-key")
result = unlocker_tool.invoke("https://example.com")

# Advanced usage with parameters
result = unlocker_tool.invoke({
    "url": "https://example.com/region-restricted-content",
    "country": "gb",  # Access as if from Great Britain
    "data_format": "markdown",  # Get content in markdown format
    "zone": "unlocker"  # Use the unlocker zone
})

🎛️ Customization Options

Parameter Type Description
url str The URL to access
format str Format of the response content (default: "raw")
country str Two-letter country code for geo-specific access (e.g., "us", "gb")
zone str Bright Data zone to use (default: "unblocker")
data_format str Output format: None (HTML), "markdown", or "screenshot"

📊 BrightDataWebScraperAPI

Extract structured data from 100+ popular domains, including Amazon, LinkedIn, and more.

from langchain_brightdata import BrightDataWebScraperAPI

# Initialize the tool
scraper_tool = BrightDataWebScraperAPI(bright_data_api_key="your-api-key")

# Extract Amazon product data
results = scraper_tool.invoke({
    "url": "https://www.amazon.com/dp/B08L5TNJHG",
    "dataset_type": "amazon_product"
})

# Extract LinkedIn profile data
linkedin_results = scraper_tool.invoke({
    "url": "https://www.linkedin.com/in/satyanadella/",
    "dataset_type": "linkedin_person_profile"
})

🎛️ Customization Options

Parameter Type Description
url str The URL to extract data from
dataset_type str Type of dataset to use (e.g., "amazon_product")
zipcode str Optional zipcode for location-specific data

📂 Available Dataset Types

Dataset Type Description
amazon_product Extract detailed Amazon product data
amazon_product_reviews Extract Amazon product reviews
linkedin_person_profile Extract LinkedIn person profile data
linkedin_company_profile Extract LinkedIn company profile data

📚 Additional Resources

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langchain_brightdata-0.1.3.tar.gz (11.1 kB view details)

Uploaded Source

Built Distribution

langchain_brightdata-0.1.3-py3-none-any.whl (12.5 kB view details)

Uploaded Python 3

File details

Details for the file langchain_brightdata-0.1.3.tar.gz.

File metadata

  • Download URL: langchain_brightdata-0.1.3.tar.gz
  • Upload date:
  • Size: 11.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for langchain_brightdata-0.1.3.tar.gz
Algorithm Hash digest
SHA256 a18ae6e03ec88f0ba19955ea6305a0d7d144a779acd311e7309282537037c514
MD5 ee675a306d3ba3eb184d67c8aac7053a
BLAKE2b-256 a729855fdc3147ba4db42bc7fd1f1a7979345422b00e405a043096a7cf9943ee

See more details on using hashes here.

File details

Details for the file langchain_brightdata-0.1.3-py3-none-any.whl.

File metadata

File hashes

Hashes for langchain_brightdata-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 568ad486ca6790f99349ccf6ee4ad2c1bd37d3df4ac9003bb73b42b0e7527dcb
MD5 8607c0e73ad4c5b79c41d6caff9d700c
BLAKE2b-256 a8c52733cbb72d3e9cc247acdbcbb5e2d8ef9561f6ce72c8564f1a82e5641926

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page