Skip to main content

Library for extracting structured data from websites using ScrapeGraphAI

Project description

🕷️🦜 langchain-scrapegraph

License Python Support Documentation

Supercharge your LangChain agents with AI-powered web scraping capabilities. LangChain-ScrapeGraph provides a seamless integration between LangChain and ScrapeGraph AI, enabling your agents to extract structured data from websites using natural language.

📦 Installation

pip install langchain-scrapegraph

🛠️ Available Tools

📝 MarkdownifyTool

Convert any webpage into clean, formatted markdown.

from langchain_scrapegraph.tools import MarkdownifyTool

tool = MarkdownifyTool()
markdown = tool.invoke({"website_url": "https://example.com"})

print(markdown)

🔍 SmartscraperTool

Extract structured data from any webpage using natural language prompts.

from langchain_scrapegraph.tools import SmartscraperTool

# Initialize the tool (uses SGAI_API_KEY from environment)
tool = SmartscraperTool()

# Extract information using natural language
result = tool.invoke({
    "website_url": "https://www.example.com",
    "user_prompt": "Extract the main heading and first paragraph"
})

print(result)

💻 LocalscraperTool

Extract information from HTML content using AI.

from langchain_scrapegraph.tools import LocalscraperTool

tool = LocalscraperTool()
result = tool.invoke({
    "user_prompt": "Extract all contact information",
    "website_html": "<html>...</html>"
})

print(result)

🌟 Key Features

  • 🐦 LangChain Integration: Seamlessly works with LangChain agents and chains
  • 🔍 AI-Powered Extraction: Use natural language to describe what data to extract
  • 📊 Structured Output: Get clean, structured data ready for your agents
  • 🔄 Flexible Tools: Choose from multiple specialized scraping tools
  • Async Support: Built-in support for async operations

💡 Use Cases

  • 📖 Research Agents: Create agents that gather and analyze web data
  • 📊 Data Collection: Automate structured data extraction from websites
  • 📝 Content Processing: Convert web content into markdown for further processing
  • 🔍 Information Extraction: Extract specific data points using natural language

🤖 Example Agent

from langchain.agents import initialize_agent, AgentType
from langchain_scrapegraph.tools import SmartscraperTool
from langchain_openai import ChatOpenAI

# Initialize tools
tools = [
    SmartscraperTool(),
]

# Create an agent
agent = initialize_agent(
    tools=tools,
    llm=ChatOpenAI(temperature=0),
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    verbose=True
)

# Use the agent
response = agent.run("""
    Visit example.com, make a summary of the content and extract the main heading and first paragraph
""")

⚙️ Configuration

Set your ScrapeGraph API key in your environment:

export SGAI_API_KEY="your-api-key-here"

Or set it programmatically:

import os
os.environ["SGAI_API_KEY"] = "your-api-key-here"

📚 Documentation

💬 Support & Feedback

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

This project is built on top of:


Made with ❤️ by ScrapeGraph AI

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langchain_scrapegraph-1.0.0b1.tar.gz (7.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

langchain_scrapegraph-1.0.0b1-py3-none-any.whl (10.1 kB view details)

Uploaded Python 3

File details

Details for the file langchain_scrapegraph-1.0.0b1.tar.gz.

File metadata

  • Download URL: langchain_scrapegraph-1.0.0b1.tar.gz
  • Upload date:
  • Size: 7.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.15

File hashes

Hashes for langchain_scrapegraph-1.0.0b1.tar.gz
Algorithm Hash digest
SHA256 15d326f4bd74256c580cf5046c2ba7af91df91f1cca7f7a7bc60fce425b916f7
MD5 6b0bd9fc34d29edccfd005b6931d90c3
BLAKE2b-256 36701f7e4a6ee1fef6067fe0c9d921e7acd8a88e2235730ff8de247aad93e491

See more details on using hashes here.

File details

Details for the file langchain_scrapegraph-1.0.0b1-py3-none-any.whl.

File metadata

File hashes

Hashes for langchain_scrapegraph-1.0.0b1-py3-none-any.whl
Algorithm Hash digest
SHA256 4280c543464dbded66ae2c1119e6d268ffbae95d29137d46f9387ac7dddc14ba
MD5 2b7468520bb82c2a251e4bc8c2c310b7
BLAKE2b-256 826f3dcfd62dfd77468770876fb7ae2d0971652a2c11e6e85ba0a5ccf357f0ff

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page