An integration package connecting Scrapeless and LangChain
Project description
LangChain Scrapeless: an all-in-one, highly scalable web scraping toolkit for enterprises and developers that also integrates with LangChain’s AI tools. Maintained by Scrapeless.
langchain-scrapeless is designed for seamless integration with LangChain, enabling you to:
- Run custom scraping tasks using your own crawlers or scraping logic.
- Automate data extraction and processing workflows in Python.
- Manage and interact with datasets produced by your scraping jobs.
- Access scraping and data handling capabilities as LangChain tools, making them easy to compose with LLM-powered chains and agents.
📦 Installation
pip install langchain-scrapeless
✅ Prerequisites
You should configure the credentials for the Scrapeless API in your environment variables.
SCRAPELESS_API_KEY: Your Scrapeless API key.
If you don't have an API key, you can register at here and learn how to get your API key in Scrapeless documentation.
🛠️ Available Tools
🔍 DeepSerp
🌐 ScrapelessDeepSerpGoogleSearchTool
Perform Google search queries and get the results.
from langchain_scrapeless import ScrapelessDeepSerpGoogleSearchTool
tool = ScrapelessDeepSerpGoogleSearchTool()
# Basic usage
# result = tool.invoke("I want to know Scrapeless")
# print(result)
# Advanced usage
result = tool.invoke({
"q": "Scrapeless",
"hl": "en",
"google_domain": "google.com"
})
print(result)
# With LangChain
from langchain_openai import ChatOpenAI
from langchain_scrapeless import ScrapelessDeepSerpGoogleSearchTool
from langgraph.prebuilt import create_react_agent
llm = ChatOpenAI()
tool = ScrapelessDeepSerpGoogleSearchTool()
# Use the tool with an agent
tools = [tool]
agent = create_react_agent(llm, tools)
for chunk in agent.stream(
{"messages": [("human", "I want to what is Scrapeless")]},
stream_mode="values"
):
chunk["messages"][-1].pretty_print()
You can visit here to learn more customizations options.
🌐 ScrapelessDeepSerpGoogleTrendsTool
Perform Google trends queries and get the results.
from langchain_scrapeless import ScrapelessDeepSerpGoogleTrendsTool
tool = ScrapelessDeepSerpGoogleTrendsTool()
# Basic usage
# result = tool.invoke("Funny 2048,negamon monster trainer")
# print(result)
# Advanced usage
result = tool.invoke({
"q": "Scrapeless",
"data_type": "related_topics",
"hl": "en"
})
print(result)
# With LangChain
from langchain_openai import ChatOpenAI
from langchain_scrapeless import ScrapelessDeepSerpGoogleTrendsTool
from langgraph.prebuilt import create_react_agent
llm = ChatOpenAI()
tool = ScrapelessDeepSerpGoogleTrendsTool()
# Use the tool with an agent
tools = [tool]
agent = create_react_agent(llm, tools)
for chunk in agent.stream(
{"messages": [("human", "I want to know the iphone keyword trends")]},
stream_mode="values"
):
chunk["messages"][-1].pretty_print()
You can visit here to learn more customizations options.
🔓 ScrapelessUniversalScrapingTool
Access any website at scale and say goodbye to blocks.
from langchain_scrapeless import ScrapelessUniversalScrapingTool
tool = ScrapelessUniversalScrapingTool()
# Basic usage
# result = tool.invoke("https://example.com")
# print(result)
# Advanced usage
result = tool.invoke({
"url": "https://exmaple.com",
"response_type": "markdown"
})
print(result)
# With LangChain
from langchain_openai import ChatOpenAI
from langchain_scrapeless import ScrapelessUniversalScrapingTool
from langgraph.prebuilt import create_react_agent
llm = ChatOpenAI()
tool = ScrapelessUniversalScrapingTool()
# Use the tool with an agent
tools = [tool]
agent = create_react_agent(llm, tools)
for chunk in agent.stream(
{"messages": [("human", "Use the scrapeless scraping tool to fetch https://www.scrapeless.com/en and extract the h1 tag.")]},
stream_mode="values"
):
chunk["messages"][-1].pretty_print()
You can visit here to learn more customizations options.
🕷️ Crawler
🌐 ScrapelessCrawlerCrawlTool
Crawl a website and its linked pages to extract comprehensive data
from langchain_scrapeless import ScrapelessCrawlerCrawlTool
tool = ScrapelessCrawlerCrawlTool()
# Basic
# result = tool.invoke("https://example.com")
# print(result)
# Advanced usage
result = tool.invoke({
"url": "https://exmaple.com",
"limit": 4
})
print(result)
# With LangChain
from langchain_openai import ChatOpenAI
from langchain_scrapeless import ScrapelessCrawlerCrawlTool
from langgraph.prebuilt import create_react_agent
llm = ChatOpenAI()
tool = ScrapelessCrawlerCrawlTool()
# Use the tool with an agent
tools = [tool]
agent = create_react_agent(llm, tools)
for chunk in agent.stream(
{"messages": [("human", "Use the scrapeless crawler crawl tool to crawl the website https://example.com and output the markdown content as a string.")]},
stream_mode="values"
):
chunk["messages"][-1].pretty_print()
You can visit here to learn more customizations options.
🌐 ScrapelessCrawlerScrapeTool
Extract data from a single or multiple webpages.
from langchain_scrapeless import ScrapelessCrawlerScrapeTool
tool = ScrapelessCrawlerScrapeTool()
result = tool.invoke({
"urls": ["https://exmaple.com", "https://www.scrapeless.com/en"],
"formats": ["markdown"]
})
print(result)
# With LangChain
from langchain_openai import ChatOpenAI
from langchain_scrapeless import ScrapelessCrawlerScrapeTool
from langgraph.prebuilt import create_react_agent
llm = ChatOpenAI()
tool = ScrapelessCrawlerScrapeTool()
# Use the tool with an agent
tools = [tool]
agent = create_react_agent(llm, tools)
for chunk in agent.stream(
{"messages": [("human", "Use the scrapeless crawler scrape tool to get the website content of https://example.com and output the html content as a string.")]},
stream_mode="values"
):
chunk["messages"][-1].pretty_print()
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file langchain_scrapeless-0.1.3.tar.gz.
File metadata
- Download URL: langchain_scrapeless-0.1.3.tar.gz
- Upload date:
- Size: 18.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7eb799342c875b8074016cf2beec57a594763392e3110643263111b0abc35f59
|
|
| MD5 |
d7132e9c1fa545ce4c3a010692fcecef
|
|
| BLAKE2b-256 |
427f65d0aa635bdeab98bfb8c36745079846f47547a9cc955a8750cdea152a5e
|
Provenance
The following attestation bundles were made for langchain_scrapeless-0.1.3.tar.gz:
Publisher:
publish.yml on scrapeless-ai/langchain-scrapeless
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
langchain_scrapeless-0.1.3.tar.gz -
Subject digest:
7eb799342c875b8074016cf2beec57a594763392e3110643263111b0abc35f59 - Sigstore transparency entry: 280646761
- Sigstore integration time:
-
Permalink:
scrapeless-ai/langchain-scrapeless@953d3a9194dfec9bf46a70492adde239afead6a8 -
Branch / Tag:
refs/tags/v0.1.3 - Owner: https://github.com/scrapeless-ai
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@953d3a9194dfec9bf46a70492adde239afead6a8 -
Trigger Event:
release
-
Statement type:
File details
Details for the file langchain_scrapeless-0.1.3-py3-none-any.whl.
File metadata
- Download URL: langchain_scrapeless-0.1.3-py3-none-any.whl
- Upload date:
- Size: 26.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
29f4f49f8d7a3017e7e311454c5b71cba76845c2e8a29a4508486bd7284a592a
|
|
| MD5 |
51a765ed51ab4d19047d168488cc7790
|
|
| BLAKE2b-256 |
929407bcdc6caf7652963d165a7740b420639466b621b67a265a9d42d09f5dae
|
Provenance
The following attestation bundles were made for langchain_scrapeless-0.1.3-py3-none-any.whl:
Publisher:
publish.yml on scrapeless-ai/langchain-scrapeless
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
langchain_scrapeless-0.1.3-py3-none-any.whl -
Subject digest:
29f4f49f8d7a3017e7e311454c5b71cba76845c2e8a29a4508486bd7284a592a - Sigstore transparency entry: 280646769
- Sigstore integration time:
-
Permalink:
scrapeless-ai/langchain-scrapeless@953d3a9194dfec9bf46a70492adde239afead6a8 -
Branch / Tag:
refs/tags/v0.1.3 - Owner: https://github.com/scrapeless-ai
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@953d3a9194dfec9bf46a70492adde239afead6a8 -
Trigger Event:
release
-
Statement type: