Skip to main content

An integration package connecting Scrapeless and LangChain

Project description

Scrapeless logo

LangChain Scrapeless: an all-in-one, highly scalable web scraping toolkit for enterprises and developers that also integrates with LangChain’s AI tools. Maintained by Scrapeless.

Scrapeless | Documentation | LangChain


langchain-scrapeless is designed for seamless integration with LangChain, enabling you to:

  • Run custom scraping tasks using your own crawlers or scraping logic.
  • Automate data extraction and processing workflows in Python.
  • Manage and interact with datasets produced by your scraping jobs.
  • Access scraping and data handling capabilities as LangChain tools, making them easy to compose with LLM-powered chains and agents.

📦 Installation

pip install langchain-scrapeless

✅ Prerequisites

You should configure the credentials for the Scrapeless API in your environment variables.

  • SCRAPELESS_API_KEY: Your Scrapeless API key.

If you don't have an API key, you can register at here and learn how to get your API key in Scrapeless documentation.

🛠️ Available Tools

🔍 DeepSerp

🌐 ScrapelessDeepSerpGoogleSearchTool

Perform Google search queries and get the results.

from langchain_scrapeless import ScrapelessDeepSerpGoogleSearchTool

tool = ScrapelessDeepSerpGoogleSearchTool()

# Basic usage
# result = tool.invoke("I want to know Scrapeless")
# print(result)

# Advanced usage
result = tool.invoke({
    "q": "Scrapeless",
    "hl": "en",
    "google_domain": "google.com"
})
print(result)

# With LangChain
from langchain_openai import ChatOpenAI
from langchain_scrapeless import ScrapelessDeepSerpGoogleSearchTool
from langgraph.prebuilt import create_react_agent

llm = ChatOpenAI()

tool = ScrapelessDeepSerpGoogleSearchTool()

# Use the tool with an agent
tools = [tool]
agent = create_react_agent(llm, tools)

for chunk in agent.stream(
        {"messages": [("human", "I want to what is Scrapeless")]},
        stream_mode="values"
):
    chunk["messages"][-1].pretty_print()

You can visit here to learn more customizations options.

🌐 ScrapelessDeepSerpGoogleTrendsTool

Perform Google trends queries and get the results.

from langchain_scrapeless import ScrapelessDeepSerpGoogleTrendsTool

tool = ScrapelessDeepSerpGoogleTrendsTool()

# Basic usage
# result = tool.invoke("Funny 2048,negamon monster trainer")
# print(result)

# Advanced usage
result = tool.invoke({
    "q": "Scrapeless",
    "data_type": "related_topics",
    "hl": "en"
})
print(result)

# With LangChain
from langchain_openai import ChatOpenAI
from langchain_scrapeless import ScrapelessDeepSerpGoogleTrendsTool
from langgraph.prebuilt import create_react_agent

llm = ChatOpenAI()

tool = ScrapelessDeepSerpGoogleTrendsTool()

# Use the tool with an agent
tools = [tool]
agent = create_react_agent(llm, tools)

for chunk in agent.stream(
        {"messages": [("human", "I want to know the iphone keyword trends")]},
        stream_mode="values"
):
    chunk["messages"][-1].pretty_print()

You can visit here to learn more customizations options.

🔓 ScrapelessUniversalScrapingTool

Access any website at scale and say goodbye to blocks.

from langchain_scrapeless import ScrapelessUniversalScrapingTool

tool = ScrapelessUniversalScrapingTool()

# Basic usage
# result = tool.invoke("https://example.com")
# print(result)

# Advanced usage
result = tool.invoke({
    "url": "https://exmaple.com",
    "response_type": "markdown"
})
print(result)

# With LangChain
from langchain_openai import ChatOpenAI
from langchain_scrapeless import ScrapelessUniversalScrapingTool
from langgraph.prebuilt import create_react_agent

llm = ChatOpenAI()

tool = ScrapelessUniversalScrapingTool()

# Use the tool with an agent
tools = [tool]
agent = create_react_agent(llm, tools)

for chunk in agent.stream(
        {"messages": [("human", "Use the scrapeless scraping tool to fetch https://www.scrapeless.com/en and extract the h1 tag.")]},
        stream_mode="values"
):
    chunk["messages"][-1].pretty_print()

You can visit here to learn more customizations options.

🕷️ Crawler

🌐 ScrapelessCrawlerCrawlTool

Crawl a website and its linked pages to extract comprehensive data

from langchain_scrapeless import ScrapelessCrawlerCrawlTool

tool = ScrapelessCrawlerCrawlTool()

# Basic
# result = tool.invoke("https://example.com")
# print(result)

# Advanced usage
result = tool.invoke({
    "url": "https://exmaple.com",
    "limit": 4
})
print(result)

# With LangChain
from langchain_openai import ChatOpenAI
from langchain_scrapeless import ScrapelessCrawlerCrawlTool
from langgraph.prebuilt import create_react_agent

llm = ChatOpenAI()

tool = ScrapelessCrawlerCrawlTool()

# Use the tool with an agent
tools = [tool]
agent = create_react_agent(llm, tools)

for chunk in agent.stream(
        {"messages": [("human", "Use the scrapeless crawler crawl tool to crawl the website https://example.com and output the markdown content as a string.")]},
        stream_mode="values"
):
    chunk["messages"][-1].pretty_print()

You can visit here to learn more customizations options.

🌐 ScrapelessCrawlerScrapeTool

Extract data from a single or multiple webpages.

from langchain_scrapeless import ScrapelessCrawlerScrapeTool

tool = ScrapelessCrawlerScrapeTool()

result = tool.invoke({
    "urls": ["https://exmaple.com", "https://www.scrapeless.com/en"],
    "formats": ["markdown"]
})
print(result)

# With LangChain
from langchain_openai import ChatOpenAI
from langchain_scrapeless import ScrapelessCrawlerScrapeTool
from langgraph.prebuilt import create_react_agent

llm = ChatOpenAI()

tool = ScrapelessCrawlerScrapeTool()

# Use the tool with an agent
tools = [tool]
agent = create_react_agent(llm, tools)

for chunk in agent.stream(
        {"messages": [("human", "Use the scrapeless crawler scrape tool to get the website content of https://example.com and output the html content as a string.")]},
        stream_mode="values"
):
    chunk["messages"][-1].pretty_print()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langchain_scrapeless-0.1.3.tar.gz (18.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

langchain_scrapeless-0.1.3-py3-none-any.whl (26.2 kB view details)

Uploaded Python 3

File details

Details for the file langchain_scrapeless-0.1.3.tar.gz.

File metadata

  • Download URL: langchain_scrapeless-0.1.3.tar.gz
  • Upload date:
  • Size: 18.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for langchain_scrapeless-0.1.3.tar.gz
Algorithm Hash digest
SHA256 7eb799342c875b8074016cf2beec57a594763392e3110643263111b0abc35f59
MD5 d7132e9c1fa545ce4c3a010692fcecef
BLAKE2b-256 427f65d0aa635bdeab98bfb8c36745079846f47547a9cc955a8750cdea152a5e

See more details on using hashes here.

Provenance

The following attestation bundles were made for langchain_scrapeless-0.1.3.tar.gz:

Publisher: publish.yml on scrapeless-ai/langchain-scrapeless

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file langchain_scrapeless-0.1.3-py3-none-any.whl.

File metadata

File hashes

Hashes for langchain_scrapeless-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 29f4f49f8d7a3017e7e311454c5b71cba76845c2e8a29a4508486bd7284a592a
MD5 51a765ed51ab4d19047d168488cc7790
BLAKE2b-256 929407bcdc6caf7652963d165a7740b420639466b621b67a265a9d42d09f5dae

See more details on using hashes here.

Provenance

The following attestation bundles were made for langchain_scrapeless-0.1.3-py3-none-any.whl:

Publisher: publish.yml on scrapeless-ai/langchain-scrapeless

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page