Skip to main content

ScrapeGraph Python SDK for API

Project description

ScrapeGraph API Python SDK

The official Python SDK for interacting with the ScrapeGraphAI API - a powerful web scraping and data extraction service.

Installation

Install the package using pip:

pip install scrapegraph-py

Authentication

To use the ScrapeGraph API, you'll need an API key. You can manage this in two ways:

  1. Environment variable:
export SCRAPEGRAPH_API_KEY="your-api-key-here"
  1. .env file:
SCRAPEGRAPH_API_KEY="your-api-key-here"

Features

The SDK provides four main functionalities:

  1. Web Scraping (basic and structured)
  2. Credits checking
  3. Feedback submission
  4. API status checking

Usage

Basic Web Scraping

import os
from scrapegraph_py import ScrapeGraphClient, smart_scraper
from dotenv import load_dotenv

load_dotenv()
api_key = os.getenv("SCRAPEGRAPH_API_KEY")
client = ScrapeGraphClient(api_key)

url = "https://scrapegraphai.com/"
prompt = "What does the company do?"

result = smart_scraper(client, url, prompt)
print(result)

Local HTML Scraping

You can also scrape content from local HTML files:

from scrapegraph_py import ScrapeGraphClient, scrape_text
from bs4 import BeautifulSoup
import os

# Load environment variables
from dotenv import load_dotenv
load_dotenv()
api_key = os.getenv("SCRAPEGRAPH_API_KEY")

client = ScrapeGraphClient(api_key)

def scrape_local_html(client: ScrapeGraphClient, file_path: str, prompt: str):
    with open(file_path, 'r', encoding='utf-8') as file:
        html_content = file.read()
    
    # Use BeautifulSoup to extract text content
    soup = BeautifulSoup(html_content, 'html.parser')
    text_content = soup.get_text(separator='\n', strip=True)
    
    # Use ScrapeGraph AI to analyze the text
    return scrape_text(client, text_content, prompt)

# Usage
result = scrape_local_html(
    client,
    'sample.html',
    "Extract main content and important information"
)
print("Extracted Data:", result)

Structured Data Extraction

For more structured data extraction, you can define a Pydantic schema:

from pydantic import BaseModel, Field
from scrapegraph_py import scrape
import os

# Load environment variables
from dotenv import load_dotenv
load_dotenv()
api_key = os.getenv("SCRAPEGRAPH_API_KEY")

class CompanyInfoSchema(BaseModel):
    company_name: str = Field(description="The name of the company")
    description: str = Field(description="A description of the company")
    main_products: list[str] = Field(description="The main products of the company")

# Scrape with schema
result = scrape(
    api_key=api_key,
    url="https://scrapegraphai.com/",
    prompt="What does the company do?",
    schema=CompanyInfoSchema
)
print(result)

Check Credits

Monitor your API usage:

from scrapegraph_py import credits
import os

# Load environment variables
from dotenv import load_dotenv
load_dotenv()
api_key = os.getenv("SCRAPEGRAPH_API_KEY")

response = credits(api_key)
print(response)

Provide Feedback and Check Status

You can provide feedback on scraping results and check the API status:

from scrapegraph_py import feedback, status
import os

# Load environment variables
from dotenv import load_dotenv
load_dotenv()
api_key = os.getenv("SCRAPEGRAPH_API_KEY")

# Check API status
status_response = status(api_key)
print(f"API Status: {status_response}")

# Submit feedback
feedback_response = feedback(
    api_key=api_key,
    request_id="your-request-id",  # UUID from your scraping request
    rating=5,  # Rating from 1-5
    message="Great results!"
)
print(f"Feedback Response: {feedback_response}")

Expected Output Example

The following is an example of the expected output when scraping articles from a webpage:

{
  "articles": [
    {
      "title": "Thousands of People Are Cloning Their Dead Pets. This Is the Woman They Call First",
      "url": "https://www.wired.com/story/your-next-job-pet-cloner/"
    },
    {
      "title": "The Quantum Geometry That Exists Outside of Space and Time",
      "url": "https://www.wired.com/story/physicists-reveal-a-quantum-geometry-that-exists-outside-of-space-and-time/"
    },
    {
      "title": "How a PhD Student Discovered a Lost Mayan City From Hundreds of Miles Away",
      "url": "https://www.wired.com/story/lost-maya-city-valeriana-interview/"
    },
    {
      "title": "The Maker of Ozempic Is Trying to Block Compounded Versions of Its Blockbuster Drug",
      "url": "https://www.wired.com/story/novo-nordisk-ozempic-compounded-fda-block-pharmacies/"
    }
  ]
}

Development

Requirements

  • Python 3.9+
  • Rye for dependency management (optional)

Project Structure

scrapegraph_py/
├── __init__.py
├── credits.py      # Credits checking functionality
├── scrape.py      # Core scraping functionality
└── feedback.py    # Feedback submission functionality

examples/
├── credits_example.py
├── feedback_example.py
├── scrape_example.py
└── scrape_schema_example.py

tests/
├── test_credits.py
├── test_feedback.py
└── test_scrape.py

Setting up the Development Environment

  1. Clone the repository:
git clone https://github.com/yourusername/scrapegraph-py.git
cd scrapegraph-py
  1. Install dependencies:
# If using Rye
rye sync

# If using pip
pip install -r requirements-dev.lock
  1. Create a .env file in the root directory:
SCRAPEGRAPH_API_KEY="your-api-key-here"

License

This project is licensed under the MIT License.

Support

For support:

  • Visit ScrapeGraph AI
  • Contact our support team
  • Check the examples in the examples/ directory

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapegraph_py-1.1.0.tar.gz (44.9 kB view details)

Uploaded Source

Built Distribution

scrapegraph_py-1.1.0-py3-none-any.whl (9.1 kB view details)

Uploaded Python 3

File details

Details for the file scrapegraph_py-1.1.0.tar.gz.

File metadata

  • Download URL: scrapegraph_py-1.1.0.tar.gz
  • Upload date:
  • Size: 44.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for scrapegraph_py-1.1.0.tar.gz
Algorithm Hash digest
SHA256 ca17523977a6cd9d86e4c9c474a0ad59d50c7c591dda1ad2186277cbd300ae29
MD5 a343d376e44cb8905e51e04187e3fde9
BLAKE2b-256 5f2c87ef3b6e9c5b0ec4a8f9f6cda82b932a8d469f27a1c75532a9f27af8b6f6

See more details on using hashes here.

File details

Details for the file scrapegraph_py-1.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for scrapegraph_py-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 61998d71ce6787dcc20d1f7fc462ba390dfe4c22a1a6281195d855c51baa139f
MD5 9d44fb1bd858593f20e156e4e3b1df96
BLAKE2b-256 4bd987e2918d1eaee1ce5754c793d1a5417e9d7efef8ad8024e6ac3936ba3d0f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page