Skip to main content

Official Python SDK for ScrapeGraph AI API

Project description

ScrapeGraphAI Python SDK

PyPI version License: MIT

ScrapeGraphAI Python SDK

Official Python SDK for the ScrapeGraphAI API.

Install

pip install scrapegraph-py
# or
uv add scrapegraph-py

Quick Start

from scrapegraph_py import ScrapeGraphAI, ScrapeRequest

# reads SGAI_API_KEY from env, or pass explicitly: ScrapeGraphAI(api_key="...")
sgai = ScrapeGraphAI()

result = sgai.scrape(ScrapeRequest(
    url="https://example.com",
))

if result.status == "success":
    print(result.data["results"]["markdown"]["data"])
else:
    print(result.error)

Every method returns ApiResult[T] — no exceptions to catch:

@dataclass
class ApiResult(Generic[T]):
    status: Literal["success", "error"]
    data: T | None
    error: str | None
    elapsed_ms: int

API

scrape

Scrape a webpage in multiple formats (markdown, html, screenshot, json, etc).

from scrapegraph_py import (
    ScrapeGraphAI, ScrapeRequest, FetchConfig,
    MarkdownFormatConfig, ScreenshotFormatConfig, JsonFormatConfig
)

sgai = ScrapeGraphAI()

res = sgai.scrape(ScrapeRequest(
    url="https://example.com",
    formats=[
        MarkdownFormatConfig(mode="reader"),
        ScreenshotFormatConfig(full_page=True, width=1440, height=900),
        JsonFormatConfig(prompt="Extract product info"),
    ],
    content_type="text/html",           # optional, auto-detected
    fetch_config=FetchConfig(           # optional
        mode="js",                      # "auto" | "fast" | "js"
        stealth=True,
        timeout=30000,
        wait=2000,
        scrolls=3,
        headers={"Accept-Language": "en"},
        cookies={"session": "abc"},
        country="us",
    ),
))

Formats:

  • markdown — Clean markdown (modes: normal, reader, prune)
  • html — Raw HTML (modes: normal, reader, prune)
  • links — All links on the page
  • images — All image URLs
  • summary — AI-generated summary
  • json — Structured extraction with prompt/schema
  • branding — Brand colors, typography, logos
  • screenshot — Page screenshot (full_page, width, height, quality)

extract

Extract structured data from a URL, HTML, or markdown using AI.

from scrapegraph_py import ScrapeGraphAI, ExtractRequest

sgai = ScrapeGraphAI()

res = sgai.extract(ExtractRequest(
    url="https://example.com",
    prompt="Extract product names and prices",
    schema={"type": "object", "properties": {...}},  # optional
    mode="reader",                                    # optional
    fetch_config=FetchConfig(...),                   # optional
))
# Or pass html/markdown directly instead of url

search

Search the web and optionally extract structured data.

from scrapegraph_py import ScrapeGraphAI, SearchRequest

sgai = ScrapeGraphAI()

res = sgai.search(SearchRequest(
    query="best programming languages 2024",
    num_results=5,                      # 1-20, default 3
    format="markdown",                  # "markdown" | "html"
    prompt="Extract key points",        # optional, for AI extraction
    schema={...},                       # optional
    time_range="past_week",             # optional
    location_geo_code="us",             # optional
    fetch_config=FetchConfig(...),      # optional
))

crawl

Crawl a website and its linked pages.

from scrapegraph_py import ScrapeGraphAI, CrawlRequest, MarkdownFormatConfig

sgai = ScrapeGraphAI()

# Start a crawl
start = sgai.crawl.start(CrawlRequest(
    url="https://example.com",
    formats=[MarkdownFormatConfig()],
    max_pages=50,
    max_depth=2,
    max_links_per_page=10,
    include_patterns=["/blog/*"],
    exclude_patterns=["/admin/*"],
    fetch_config=FetchConfig(...),
))

# Check status
status = sgai.crawl.get(start.data["id"])

# Control
sgai.crawl.stop(crawl_id)
sgai.crawl.resume(crawl_id)
sgai.crawl.delete(crawl_id)

monitor

Monitor a webpage for changes on a schedule.

from scrapegraph_py import ScrapeGraphAI, MonitorCreateRequest, MarkdownFormatConfig

sgai = ScrapeGraphAI()

# Create a monitor
mon = sgai.monitor.create(MonitorCreateRequest(
    url="https://example.com",
    name="Price Monitor",
    interval="0 * * * *",               # cron expression
    formats=[MarkdownFormatConfig()],
    webhook_url="https://...",          # optional
    fetch_config=FetchConfig(...),
))

# Manage monitors
sgai.monitor.list()
sgai.monitor.get(cron_id)
sgai.monitor.update(cron_id, MonitorUpdateRequest(interval="0 */6 * * *"))
sgai.monitor.pause(cron_id)
sgai.monitor.resume(cron_id)
sgai.monitor.delete(cron_id)

history

Fetch request history.

from scrapegraph_py import ScrapeGraphAI, HistoryFilter

sgai = ScrapeGraphAI()

history = sgai.history.list(HistoryFilter(
    service="scrape",                   # optional filter
    page=1,
    limit=20,
))

entry = sgai.history.get("request-id")

credits / health

from scrapegraph_py import ScrapeGraphAI

sgai = ScrapeGraphAI()

credits = sgai.credits()
# { remaining: 1000, used: 500, plan: "pro", jobs: { crawl: {...}, monitor: {...} } }

health = sgai.health()
# { status: "ok", uptime: 12345 }

Async Client

All methods have async equivalents via AsyncScrapeGraphAI:

import asyncio
from scrapegraph_py import AsyncScrapeGraphAI, ScrapeRequest

async def main():
    async with AsyncScrapeGraphAI() as sgai:
        result = await sgai.scrape(ScrapeRequest(url="https://example.com"))
        if result.status == "success":
            print(result.data["results"]["markdown"]["data"])
        else:
            print(result.error)

asyncio.run(main())

Async Extract

async with AsyncScrapeGraphAI() as sgai:
    res = await sgai.extract(ExtractRequest(
        url="https://example.com",
        prompt="Extract product names and prices",
    ))

Async Search

async with AsyncScrapeGraphAI() as sgai:
    res = await sgai.search(SearchRequest(
        query="best programming languages 2024",
        num_results=5,
    ))

Async Crawl

async with AsyncScrapeGraphAI() as sgai:
    start = await sgai.crawl.start(CrawlRequest(
        url="https://example.com",
        max_pages=50,
    ))
    status = await sgai.crawl.get(start.data["id"])

Async Monitor

async with AsyncScrapeGraphAI() as sgai:
    mon = await sgai.monitor.create(MonitorCreateRequest(
        url="https://example.com",
        name="Price Monitor",
        interval="0 * * * *",
    ))

Examples

Sync Examples

Service Example Description
scrape scrape_basic.py Basic markdown scraping
scrape scrape_multi_format.py Multiple formats
scrape scrape_json_extraction.py Structured JSON extraction
scrape scrape_pdf.py PDF document parsing
scrape scrape_with_fetchconfig.py JS rendering, stealth mode
extract extract_basic.py AI data extraction
extract extract_with_schema.py Extraction with JSON schema
search search_basic.py Web search
search search_with_extraction.py Search + AI extraction
crawl crawl_basic.py Start and monitor a crawl
crawl crawl_with_formats.py Crawl with formats
monitor monitor_basic.py Create a page monitor
monitor monitor_with_webhook.py Monitor with webhook
utilities credits.py Check credits and limits
utilities health.py API health check
utilities history.py Request history

Async Examples

Service Example Description
scrape scrape_basic_async.py Basic markdown scraping
scrape scrape_multi_format_async.py Multiple formats
scrape scrape_json_extraction_async.py Structured JSON extraction
scrape scrape_pdf_async.py PDF document parsing
scrape scrape_with_fetchconfig_async.py JS rendering, stealth mode
extract extract_basic_async.py AI data extraction
extract extract_with_schema_async.py Extraction with JSON schema
search search_basic_async.py Web search
search search_with_extraction_async.py Search + AI extraction
crawl crawl_basic_async.py Start and monitor a crawl
crawl crawl_with_formats_async.py Crawl with formats
monitor monitor_basic_async.py Create a page monitor
monitor monitor_with_webhook_async.py Monitor with webhook
utilities credits_async.py Check credits and limits
utilities health_async.py API health check
utilities history_async.py Request history

Environment Variables

Variable Description Default
SGAI_API_KEY Your ScrapeGraphAI API key
SGAI_API_URL Override API base URL https://v2-api.scrapegraphai.com/api
SGAI_DEBUG Enable debug logging ("1") off
SGAI_TIMEOUT Request timeout in seconds 120

Development

uv sync
uv run pytest tests/              # unit tests
uv run pytest tests/test_integration.py  # live API tests (requires SGAI_API_KEY)
uv run ruff check .               # lint

License

MIT - ScrapeGraphAI

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapegraph_py-2.0.1.tar.gz (4.9 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scrapegraph_py-2.0.1-py3-none-any.whl (14.1 kB view details)

Uploaded Python 3

File details

Details for the file scrapegraph_py-2.0.1.tar.gz.

File metadata

  • Download URL: scrapegraph_py-2.0.1.tar.gz
  • Upload date:
  • Size: 4.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.5.14

File hashes

Hashes for scrapegraph_py-2.0.1.tar.gz
Algorithm Hash digest
SHA256 69dbb2c4498b375e0098f02795f26b79386647c7640a2199098dd2bf7fa1b737
MD5 124e5b3870c244d3fed230e06f96dfc4
BLAKE2b-256 add6aefef15a267efc31a91e78465f924f6d16bcaae893996f730b9b22c41e02

See more details on using hashes here.

File details

Details for the file scrapegraph_py-2.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for scrapegraph_py-2.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 0925ff7aa4f048d0d203a9fe5cd095a679b32cade6099c56840acdc1c9dd38dc
MD5 aa67e77273a1f3b1770302f59faf688a
BLAKE2b-256 c6f86b6943f8f690979f88ffb922544d8cab22e86e9ca01a24285357c2ef8969

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page