Library for extracting structured data from websites using ScrapeGraphAI
Project description
🕷️🦜 langchain-scrapegraph
Supercharge your LangChain agents with AI-powered web scraping capabilities. LangChain-ScrapeGraph provides a seamless integration between LangChain and ScrapeGraph AI, enabling your agents to extract structured data from websites using natural language.
📦 Installation
pip install langchain-scrapegraph
🛠️ Available Tools
📝 MarkdownifyTool
Convert any webpage into clean, formatted markdown.
from langchain_scrapegraph.tools import MarkdownifyTool
tool = MarkdownifyTool()
markdown = tool.invoke({"website_url": "https://example.com"})
print(markdown)
🔍 SmartscraperTool
Extract structured data from any webpage using natural language prompts.
from langchain_scrapegraph.tools import SmartscraperTool
# Initialize the tool (uses SGAI_API_KEY from environment)
tool = SmartscraperTool()
# Extract information using natural language
result = tool.invoke({
"website_url": "https://www.example.com",
"user_prompt": "Extract the main heading and first paragraph"
})
print(result)
💻 LocalscraperTool
Extract information from HTML content using AI.
from langchain_scrapegraph.tools import LocalscraperTool
tool = LocalscraperTool()
result = tool.invoke({
"user_prompt": "Extract all contact information",
"website_html": "<html>...</html>"
})
print(result)
🌟 Key Features
- 🐦 LangChain Integration: Seamlessly works with LangChain agents and chains
- 🔍 AI-Powered Extraction: Use natural language to describe what data to extract
- 📊 Structured Output: Get clean, structured data ready for your agents
- 🔄 Flexible Tools: Choose from multiple specialized scraping tools
- ⚡ Async Support: Built-in support for async operations
💡 Use Cases
- 📖 Research Agents: Create agents that gather and analyze web data
- 📊 Data Collection: Automate structured data extraction from websites
- 📝 Content Processing: Convert web content into markdown for further processing
- 🔍 Information Extraction: Extract specific data points using natural language
🤖 Example Agent
from langchain.agents import initialize_agent, AgentType
from langchain_scrapegraph.tools import SmartscraperTool
from langchain_openai import ChatOpenAI
# Initialize tools
tools = [
SmartscraperTool(),
]
# Create an agent
agent = initialize_agent(
tools=tools,
llm=ChatOpenAI(temperature=0),
agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
verbose=True
)
# Use the agent
response = agent.run("""
Visit example.com, make a summary of the content and extract the main heading and first paragraph
""")
⚙️ Configuration
Set your ScrapeGraph API key in your environment:
export SGAI_API_KEY="your-api-key-here"
Or set it programmatically:
import os
os.environ["SGAI_API_KEY"] = "your-api-key-here"
📚 Documentation
💬 Support & Feedback
- 📧 Email: support@scrapegraphai.com
- 💻 GitHub Issues: Create an issue
- 🌟 Feature Requests: Request a feature
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
🙏 Acknowledgments
This project is built on top of:
Made with ❤️ by ScrapeGraph AI
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file langchain_scrapegraph-1.1.0b1.tar.gz.
File metadata
- Download URL: langchain_scrapegraph-1.1.0b1.tar.gz
- Upload date:
- Size: 7.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
479c7093c857a0ce14e3a55835cf168f9f10468dd87f57b7c47411735e050fa3
|
|
| MD5 |
a232a32aad35055d9a531deb6cd9a2e4
|
|
| BLAKE2b-256 |
b7e168941ebed4a9a89b9f6a98757122b4ab1cba448f8c63231565c28fe9de8b
|
File details
Details for the file langchain_scrapegraph-1.1.0b1-py3-none-any.whl.
File metadata
- Download URL: langchain_scrapegraph-1.1.0b1-py3-none-any.whl
- Upload date:
- Size: 10.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2749fcf8abd02716813f58ba8fd8125809f4dc0828559d68473f70598f265fee
|
|
| MD5 |
f2fb81f75dd3a6e86ca3396d7dadc68f
|
|
| BLAKE2b-256 |
18f2366cf362a7e503d46d31f40d3498def056801e68f08fe19b44da789ec61c
|