Skip to main content

ScrapeGraph Python SDK for API

Project description

🌐 ScrapeGraph Python SDK

PyPI version Python Support License Code style: black Documentation Status

ScrapeGraph API Banner

Official Python SDK for the ScrapeGraph API - Smart web scraping powered by AI.

📦 Installation

pip install scrapegraph-py

🚀 Features

  • 🤖 AI-powered web scraping and search
  • 🔄 Both sync and async clients
  • 📊 Structured output with Pydantic schemas
  • 🔍 Detailed logging
  • ⚡ Automatic retries
  • 🔐 Secure authentication

🎯 Quick Start

from scrapegraph_py import Client

client = Client(api_key="your-api-key-here")

[!NOTE] You can set the SGAI_API_KEY environment variable and initialize the client without parameters: client = Client()

📚 Available Endpoints

🤖 SmartScraper

Extract structured data from any webpage or HTML content using AI.

from scrapegraph_py import Client

client = Client(api_key="your-api-key-here")

# Using a URL
response = client.smartscraper(
    website_url="https://example.com",
    user_prompt="Extract the main heading and description"
)

# Or using HTML content
html_content = """
<html>
    <body>
        <h1>Company Name</h1>
        <p>We are a technology company focused on AI solutions.</p>
    </body>
</html>
"""

response = client.smartscraper(
    website_html=html_content,
    user_prompt="Extract the company description"
)

print(response)
Output Schema (Optional)
from pydantic import BaseModel, Field
from scrapegraph_py import Client

client = Client(api_key="your-api-key-here")

class WebsiteData(BaseModel):
    title: str = Field(description="The page title")
    description: str = Field(description="The meta description")

response = client.smartscraper(
    website_url="https://example.com",
    user_prompt="Extract the title and description",
    output_schema=WebsiteData
)
🍪 Cookies Support

Use cookies for authentication and session management:

from scrapegraph_py import Client

client = Client(api_key="your-api-key-here")

# Define cookies for authentication
cookies = {
    "session_id": "abc123def456",
    "auth_token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
    "user_preferences": "dark_mode,usd"
}

response = client.smartscraper(
    website_url="https://example.com/dashboard",
    user_prompt="Extract user profile information",
    cookies=cookies
)

Common Use Cases:

  • E-commerce sites: User authentication, shopping cart persistence
  • Social media: Session management, user preferences
  • Banking/Financial: Secure authentication, transaction history
  • News sites: User preferences, subscription content
  • API endpoints: Authentication tokens, API keys
🔄 Advanced Features

Infinite Scrolling:

response = client.smartscraper(
    website_url="https://example.com/feed",
    user_prompt="Extract all posts from the feed",
    cookies=cookies,
    number_of_scrolls=10  # Scroll 10 times to load more content
)

Pagination:

response = client.smartscraper(
    website_url="https://example.com/products",
    user_prompt="Extract all product information",
    cookies=cookies,
    total_pages=5  # Scrape 5 pages
)

Combined with Cookies:

response = client.smartscraper(
    website_url="https://example.com/dashboard",
    user_prompt="Extract user data from all pages",
    cookies=cookies,
    number_of_scrolls=5,
    total_pages=3
)

🔍 SearchScraper

Perform AI-powered web searches with structured results and reference URLs.

from scrapegraph_py import Client

client = Client(api_key="your-api-key-here")

response = client.searchscraper(
    user_prompt="What is the latest version of Python and its main features?"
)

print(f"Answer: {response['result']}")
print(f"Sources: {response['reference_urls']}")
Output Schema (Optional)
from pydantic import BaseModel, Field
from scrapegraph_py import Client

client = Client(api_key="your-api-key-here")

class PythonVersionInfo(BaseModel):
    version: str = Field(description="The latest Python version number")
    release_date: str = Field(description="When this version was released")
    major_features: list[str] = Field(description="List of main features")

response = client.searchscraper(
    user_prompt="What is the latest version of Python and its main features?",
    output_schema=PythonVersionInfo
)

📝 Markdownify

Converts any webpage into clean, formatted markdown.

from scrapegraph_py import Client

client = Client(api_key="your-api-key-here")

response = client.markdownify(
    website_url="https://example.com"
)

print(response)

⚡ Async Support

All endpoints support async operations:

import asyncio
from scrapegraph_py import AsyncClient

async def main():
    async with AsyncClient() as client:
        response = await client.smartscraper(
            website_url="https://example.com",
            user_prompt="Extract the main content"
        )
        print(response)

asyncio.run(main())

📖 Documentation

For detailed documentation, visit docs.scrapegraphai.com

🛠️ Development

For information about setting up the development environment and contributing to the project, see our Contributing Guide.

💬 Support & Feedback

  • 📧 Email: support@scrapegraphai.com
  • 💻 GitHub Issues: Create an issue
  • 🌟 Feature Requests: Request a feature
  • ⭐ API Feedback: You can also submit feedback programmatically using the feedback endpoint:
    from scrapegraph_py import Client
    
    client = Client(api_key="your-api-key-here")
    
    client.submit_feedback(
        request_id="your-request-id",
        rating=5,
        feedback_text="Great results!"
    )
    

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🔗 Links


Made with ❤️ by ScrapeGraph AI

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapegraph_py-1.16.0.tar.gz (175.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scrapegraph_py-1.16.0-py3-none-any.whl (19.3 kB view details)

Uploaded Python 3

File details

Details for the file scrapegraph_py-1.16.0.tar.gz.

File metadata

  • Download URL: scrapegraph_py-1.16.0.tar.gz
  • Upload date:
  • Size: 175.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.3

File hashes

Hashes for scrapegraph_py-1.16.0.tar.gz
Algorithm Hash digest
SHA256 cbf3defac26918d4ab3a4b6543836059d08a63ba4be59ba7b6dd90bf5e7ade51
MD5 60cb993bcd6119b6766b7477906f7873
BLAKE2b-256 b49ab37b2693234ba07a116e5c27f04947d1a0cc42c07205bb60cc167424a14b

See more details on using hashes here.

File details

Details for the file scrapegraph_py-1.16.0-py3-none-any.whl.

File metadata

File hashes

Hashes for scrapegraph_py-1.16.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f8cf7fce7472022be868af2d71f30dce9e45fa9279aec5ceb4496a0ad3c0df7f
MD5 1a9c0d94a555684ab984a804cb25e13b
BLAKE2b-256 59ac53c64c45952c073c76bd94e61d5a9ac5af630a671c567a577ba4b9a1c10f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page