Skip to main content

An integration package connecting Scrapingbee and LangChain

Project description

🐝 langchain-scrapingbee

The Best Web Scraping API to Avoid Getting Blocked

Overview

The ScrapingBee web scraping API handles headless browsers, rotates proxies for you, and offers AI-powered data extraction.

This package contains the LangChain integration with Scrapingbee

Installation

pip install -U langchain-scrapingbee

And you should configure credentials by setting the following environment variables:

  • SCRAPINGBEE_API_KEY

Tools

ScrapingBee Integration provides you acceess to the following tools:

  • ScrapeUrlTool - Scrape the contents of any public website.
  • GoogleSearchTool - Search Google to obtain the following types of information regular search (classic), news, maps, and images.
  • CheckUsageTool — Monitor your ScrapingBee credit or concurrency usage using this tool.

Example

import os
import getpass
from langchain_scrapingbee import (
    ScrapeUrlTool, 
    GoogleSearchTool, 
    CheckUsageTool,
)

api_key = os.environ.get("SCRAPINGBEE_API_KEY")
if not api_key:
    print("SCRAPINGBEE_API_KEY environment variable is not set. Please enter the API Key here:")
    os.environ["SCRAPINGBEE_API_KEY"] = getpass.getpass()

scrape_tool = ScrapeUrlTool(api_key=os.environ.get("SCRAPINGBEE_API_KEY"))
search_tool = GoogleSearchTool(api_key=os.environ.get("SCRAPINGBEE_API_KEY"))
usage_tool = CheckUsageTool(api_key=os.environ.get("SCRAPINGBEE_API_KEY"))

# --- Test Case 1: Scrape a standard HTML page ---
print("--- 1. Testing ScrapeUrlTool (HTML) ---")
html_result = scrape_tool.invoke({
    'url': 'http://httpbin.org/html'
})
print(html_result)


# --- Test Case 2: Scrape a PDF file ---
print("--- 2. Testing ScrapeUrlTool (PDF) ---")
pdf_result = scrape_tool.invoke({
    'url': 'https://treaties.un.org/doc/publication/ctc/uncharter.pdf',
    'params': {'render_js': False} 
})
print(pdf_result)


# --- Test Case 3: Google Search ---
print("--- 3. Testing GoogleSearchTool ---")
search_result = search_tool.invoke({
    'search': 'What is LangChain?'
})
print(search_result)


# --- Test Case 4: Check Usage ---
print("--- 4. Testing CheckUsageTool ---")
usage_result = usage_tool.invoke({}) # No arguments needed
print(usage_result)

Example Using Agent

import os
from langchain_scrapingbee import (
    ScrapeUrlTool, 
    GoogleSearchTool, 
    CheckUsageTool,
)
from langchain_google_genai import ChatGoogleGenerativeAI
from langgraph.prebuilt import create_react_agent

if not os.environ.get("GOOGLE_API_KEY") or not os.environ.get("SCRAPINGBEE_API_KEY"):
    raise ValueError("Google and ScrapingBee API keys must be set in environment variables.")

llm = ChatGoogleGenerativeAI(temperature=0, model="gemini-2.5-flash")
scrapingbee_api_key = os.environ.get("SCRAPINGBEE_API_KEY")

tools = [
    ScrapeUrlTool(api_key=scrapingbee_api_key),
    GoogleSearchTool(api_key=scrapingbee_api_key),
    CheckUsageTool(api_key=scrapingbee_api_key),
]

agent = create_react_agent(llm, tools)

user_input = "If I have enough API Credits, search for pdfs about langchain and save 3 pdfs."

# Stream the agent's output step-by-step
for step in agent.stream(
    {"messages": user_input},
    stream_mode="values",
):
    step["messages"][-1].pretty_print()

Documentation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langchain_scrapingbee-0.1.0.tar.gz (13.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

langchain_scrapingbee-0.1.0-py3-none-any.whl (13.3 kB view details)

Uploaded Python 3

File details

Details for the file langchain_scrapingbee-0.1.0.tar.gz.

File metadata

  • Download URL: langchain_scrapingbee-0.1.0.tar.gz
  • Upload date:
  • Size: 13.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.4 CPython/3.13.5 Darwin/24.6.0

File hashes

Hashes for langchain_scrapingbee-0.1.0.tar.gz
Algorithm Hash digest
SHA256 6c920c3202ac4f71fd37c2142dd654d09037c07b0241ccf666615bf978ceae36
MD5 3d1fe2386551c40f256210871aa08439
BLAKE2b-256 fab7dbd20f0dddbb1287db06b42300e2b35ebe63d6c3155b91fff0e8953b93cb

See more details on using hashes here.

File details

Details for the file langchain_scrapingbee-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for langchain_scrapingbee-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 68fa114a2018439a6fd53a5aa4523ee97c4c3e672fd0d27f7ec828d5ffd2a5c0
MD5 a89e84bb91dd635e91b0ba188d9917ad
BLAKE2b-256 1268c4806e3057822c7574e4dc1e702b23aa540f62665fc621d5e73d4c196e01

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page