Skip to main content

Official Python SDK for ScrapeGraph AI API

Project description

ScrapeGraphAI Python SDK

PyPI version License: MIT

ScrapeGraphAI Python SDK

Official Python SDK for the ScrapeGraphAI API.

Install

pip install scrapegraph-py
# or
uv add scrapegraph-py

Quick Start

from scrapegraph_py import ScrapeGraphAI

# reads SGAI_API_KEY from env, or pass explicitly: ScrapeGraphAI(api_key="...")
sgai = ScrapeGraphAI()

result = sgai.scrape("https://example.com")

if result.status == "success":
    print(result.data["results"]["markdown"]["data"])
else:
    print(result.error)

Every method returns ApiResult[T] — no exceptions to catch:

@dataclass
class ApiResult(Generic[T]):
    status: Literal["success", "error"]
    data: T | None
    error: str | None
    elapsed_ms: int

API

scrape

Scrape a webpage in multiple formats (markdown, html, screenshot, json, etc).

from scrapegraph_py import (
    ScrapeGraphAI, FetchConfig,
    MarkdownFormatConfig, ScreenshotFormatConfig, JsonFormatConfig
)

sgai = ScrapeGraphAI()

res = sgai.scrape(
    "https://example.com",
    formats=[
        MarkdownFormatConfig(mode="reader"),
        ScreenshotFormatConfig(full_page=True, width=1440, height=900),
        JsonFormatConfig(prompt="Extract product info"),
    ],
    content_type="text/html",           # optional, auto-detected
    fetch_config=FetchConfig(           # optional
        mode="js",                      # "auto" | "fast" | "js"
        stealth=True,
        timeout=30000,
        wait=2000,
        scrolls=3,
        headers={"Accept-Language": "en"},
        cookies={"session": "abc"},
        country="us",
    ),
)

Formats:

  • markdown — Clean markdown (modes: normal, reader, prune)
  • html — Raw HTML (modes: normal, reader, prune)
  • links — All links on the page
  • images — All image URLs
  • summary — AI-generated summary
  • json — Structured extraction with prompt/schema
  • branding — Brand colors, typography, logos
  • screenshot — Page screenshot (full_page, width, height, quality)

extract

Extract structured data from a URL, HTML, or markdown using AI.

from scrapegraph_py import ScrapeGraphAI

sgai = ScrapeGraphAI()

res = sgai.extract(
    prompt="Extract product names and prices",
    url="https://example.com",
    schema={"type": "object", "properties": {...}},  # optional
    mode="reader",                                    # optional
    # Or pass html/markdown directly instead of url
)

search

Search the web and optionally extract structured data.

from scrapegraph_py import ScrapeGraphAI

sgai = ScrapeGraphAI()

res = sgai.search(
    "best programming languages 2024",
    num_results=5,                      # 1-20, default 3
    format="markdown",                  # "markdown" | "html"
    prompt="Extract key points",        # optional, for AI extraction
    schema={...},                       # optional
    time_range="past_week",             # optional
    location_geo_code="us",             # optional
)

crawl

Crawl a website and its linked pages.

from scrapegraph_py import ScrapeGraphAI, MarkdownFormatConfig

sgai = ScrapeGraphAI()

# Start a crawl
start = sgai.crawl.start(
    "https://example.com",
    formats=[MarkdownFormatConfig()],
    max_pages=50,
    max_depth=2,
    max_links_per_page=10,
    include_patterns=["/blog/*"],
    exclude_patterns=["/admin/*"],
)

# Check status
status = sgai.crawl.get(start.data["id"])

# Control
sgai.crawl.stop(crawl_id)
sgai.crawl.resume(crawl_id)
sgai.crawl.delete(crawl_id)

monitor

Monitor a webpage for changes on a schedule.

from scrapegraph_py import ScrapeGraphAI, MarkdownFormatConfig

sgai = ScrapeGraphAI()

# Create a monitor
mon = sgai.monitor.create(
    "https://example.com",
    "0 * * * *",                        # cron expression
    name="Price Monitor",
    formats=[MarkdownFormatConfig()],
    webhook_url="https://...",          # optional
)

# Manage monitors
sgai.monitor.list()
sgai.monitor.get(cron_id)
sgai.monitor.update(cron_id, interval="0 */6 * * *")
sgai.monitor.pause(cron_id)
sgai.monitor.resume(cron_id)
sgai.monitor.delete(cron_id)

history

Fetch request history.

from scrapegraph_py import ScrapeGraphAI

sgai = ScrapeGraphAI()

history = sgai.history.list(
    service="scrape",                   # optional filter
    page=1,
    limit=20,
)

entry = sgai.history.get("request-id")

credits / health

from scrapegraph_py import ScrapeGraphAI

sgai = ScrapeGraphAI()

credits = sgai.credits()
# { remaining: 1000, used: 500, plan: "pro", jobs: { crawl: {...}, monitor: {...} } }

health = sgai.health()
# { status: "ok", uptime: 12345 }

Async Client

All methods have async equivalents via AsyncScrapeGraphAI:

import asyncio
from scrapegraph_py import AsyncScrapeGraphAI

async def main():
    async with AsyncScrapeGraphAI() as sgai:
        result = await sgai.scrape("https://example.com")
        if result.status == "success":
            print(result.data["results"]["markdown"]["data"])
        else:
            print(result.error)

asyncio.run(main())

Async Extract

async with AsyncScrapeGraphAI() as sgai:
    res = await sgai.extract(
        prompt="Extract product names and prices",
        url="https://example.com",
    )

Async Search

async with AsyncScrapeGraphAI() as sgai:
    res = await sgai.search("best programming languages 2024", num_results=5)

Async Crawl

async with AsyncScrapeGraphAI() as sgai:
    start = await sgai.crawl.start("https://example.com", max_pages=50)
    status = await sgai.crawl.get(start.data["id"])

Async Monitor

async with AsyncScrapeGraphAI() as sgai:
    mon = await sgai.monitor.create(
        "https://example.com",
        "0 * * * *",
        name="Price Monitor",
    )

Examples

Sync Examples

Service Example Description
scrape scrape_basic.py Basic markdown scraping
scrape scrape_multi_format.py Multiple formats
scrape scrape_json_extraction.py Structured JSON extraction
scrape scrape_pdf.py PDF document parsing
scrape scrape_with_fetchconfig.py JS rendering, stealth mode
extract extract_basic.py AI data extraction
extract extract_with_schema.py Extraction with JSON schema
search search_basic.py Web search
search search_with_extraction.py Search + AI extraction
crawl crawl_basic.py Start and monitor a crawl
crawl crawl_with_formats.py Crawl with formats
monitor monitor_basic.py Create a page monitor
monitor monitor_with_webhook.py Monitor with webhook
utilities credits.py Check credits and limits
utilities health.py API health check
utilities history.py Request history

Async Examples

Service Example Description
scrape scrape_basic_async.py Basic markdown scraping
scrape scrape_multi_format_async.py Multiple formats
scrape scrape_json_extraction_async.py Structured JSON extraction
scrape scrape_pdf_async.py PDF document parsing
scrape scrape_with_fetchconfig_async.py JS rendering, stealth mode
extract extract_basic_async.py AI data extraction
extract extract_with_schema_async.py Extraction with JSON schema
search search_basic_async.py Web search
search search_with_extraction_async.py Search + AI extraction
crawl crawl_basic_async.py Start and monitor a crawl
crawl crawl_with_formats_async.py Crawl with formats
monitor monitor_basic_async.py Create a page monitor
monitor monitor_with_webhook_async.py Monitor with webhook
utilities credits_async.py Check credits and limits
utilities health_async.py API health check
utilities history_async.py Request history

Environment Variables

Variable Description Default
SGAI_API_KEY Your ScrapeGraphAI API key
SGAI_API_URL Override API base URL https://v2-api.scrapegraphai.com/api
SGAI_DEBUG Enable debug logging ("1") off
SGAI_TIMEOUT Request timeout in seconds 120

Development

uv sync
uv run pytest tests/              # unit tests
uv run pytest tests/test_integration.py  # live API tests (requires SGAI_API_KEY)
uv run ruff check .               # lint

License

MIT - ScrapeGraphAI

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapegraph_py-2.1.0.tar.gz (4.9 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scrapegraph_py-2.1.0-py3-none-any.whl (15.6 kB view details)

Uploaded Python 3

File details

Details for the file scrapegraph_py-2.1.0.tar.gz.

File metadata

  • Download URL: scrapegraph_py-2.1.0.tar.gz
  • Upload date:
  • Size: 4.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.3

File hashes

Hashes for scrapegraph_py-2.1.0.tar.gz
Algorithm Hash digest
SHA256 c4d1ed4d0c11c5c10e999d310ce1146f62809a91292b3d07d69b40c2a0954d75
MD5 b12ab7c952e9a3bb5fe96f57e7033299
BLAKE2b-256 0d4b8b165cfb0d6b564cc69e4f58270bb54f2457c68d5c0fc648d547e3d0e207

See more details on using hashes here.

File details

Details for the file scrapegraph_py-2.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for scrapegraph_py-2.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9bd9a6f6c30d44d362afc873aba61fd8cf066c62fe8dd83f0eb16f3d666c66ff
MD5 d25cb36a1b3ac0a695475ceac52bbe3f
BLAKE2b-256 2e221d4b1430af1c007b095c4a4bd2af062d581e87be89a36166ab318d3e5224

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page