Skip to main content

ScrapeGraph Python SDK for API

Project description

ScrapeGraph Python SDK

The official Python SDK for interacting with the ScrapeGraph AI API - a powerful web scraping and data extraction service.

Installation

Install the package using pip:

pip install scrapegraph-py

Authentication

To use the ScrapeGraph API, you'll need an API key. You can manage this in two ways:

  1. Environment variable:
export SCRAPEGRAPH_API_KEY="your-api-key-here"
  1. .env file:
SCRAPEGRAPH_API_KEY="your-api-key-here"

Features

The SDK provides four main functionalities:

  1. Web Scraping (basic and structured)
  2. Credits checking
  3. Feedback submission
  4. API status checking

Usage

Basic Web Scraping

from scrapegraph_py import ScrapeGraphClient, scrape
from dotenv import load_dotenv

load_dotenv()
api_key = os.getenv("SCRAPEGRAPH_API_KEY")
client = ScrapeGraphClient(api_key)

url = "https://scrapegraphai.com/"
prompt = "What does the company do?"

result = scrape(client, url, prompt)
print(result)

Local HTML Scraping

You can also scrape content from local HTML files:

from scrapegraph_py import ScrapeGraphClient, scrape_text
from bs4 import BeautifulSoup

def scrape_local_html(client: ScrapeGraphClient, file_path: str, prompt: str):
    with open(file_path, 'r', encoding='utf-8') as file:
        html_content = file.read()
    
    # Use BeautifulSoup to extract text content
    soup = BeautifulSoup(html_content, 'html.parser')
    text_content = soup.get_text(separator='\n', strip=True)
    
    # Use ScrapeGraph AI to analyze the text
    return scrape_text(client, text_content, prompt)

# Usage
client = ScrapeGraphClient(api_key)
result = scrape_local_html(
    client,
    'sample.html',
    "Extract main content and important information"
)
print("Extracted Data:", result)

Structured Data Extraction

For more structured data extraction, you can define a Pydantic schema:

from pydantic import BaseModel, Field
from scrapegraph_py import scrape

class CompanyInfoSchema(BaseModel):
    company_name: str = Field(description="The name of the company")
    description: str = Field(description="A description of the company")
    main_products: list[str] = Field(description="The main products of the company")

# Scrape with schema
result = scrape(
    api_key=api_key,
    url="https://scrapegraphai.com/",
    prompt="What does the company do?",
    schema=CompanyInfoSchema
)
print(result)

Check Credits

Monitor your API usage:

from scrapegraph_py import credits

response = credits(api_key)
print(response)

Provide Feedback and Check Status

You can provide feedback on scraping results and check the API status:

from scrapegraph_py import feedback, status

# Check API status
status_response = status(api_key)
print(f"API Status: {status_response}")

# Submit feedback
feedback_response = feedback(
    api_key=api_key,
    request_id="your-request-id",  # UUID from your scraping request
    rating=5,  # Rating from 1-5
    message="Great results!"
)
print(f"Feedback Response: {feedback_response}")

Development

Requirements

  • Python 3.9+
  • Rye for dependency management (optional)

Project Structure

scrapegraph_py/
├── __init__.py
├── credits.py      # Credits checking functionality
├── scrape.py      # Core scraping functionality
└── feedback.py    # Feedback submission functionality

examples/
├── credits_example.py
├── feedback_example.py
├── scrape_example.py
└── scrape_schema_example.py

tests/
├── test_credits.py
├── test_feedback.py
└── test_scrape.py

Setting up the Development Environment

  1. Clone the repository:
git clone https://github.com/yourusername/scrapegraph-py.git
cd scrapegraph-py
  1. Install dependencies:
# If using Rye
rye sync

# If using pip
pip install -r requirements-dev.lock
  1. Create a .env file in the root directory:
SCRAPEGRAPH_API_KEY="your-api-key-here"

License

This project is licensed under the MIT License.

Support

For support:

  • Visit ScrapeGraph AI
  • Contact our support team
  • Check the examples in the examples/ directory

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapegraph_py-0.0.2.tar.gz (12.7 kB view details)

Uploaded Source

Built Distribution

scrapegraph_py-0.0.2-py3-none-any.whl (9.8 kB view details)

Uploaded Python 3

File details

Details for the file scrapegraph_py-0.0.2.tar.gz.

File metadata

  • Download URL: scrapegraph_py-0.0.2.tar.gz
  • Upload date:
  • Size: 12.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.12.3

File hashes

Hashes for scrapegraph_py-0.0.2.tar.gz
Algorithm Hash digest
SHA256 a01957fffab7da6e41dac2a03177675456e872d60e9f58538a2d7acf3a691204
MD5 f107f2e742f9f702c3461c7b047479c8
BLAKE2b-256 9e9b6b5973199079555df2238d8013b8b29495ab9fe88e002d4c6bb23abe015b

See more details on using hashes here.

File details

Details for the file scrapegraph_py-0.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for scrapegraph_py-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 ce94281b45a7e273d674efef6f426f4083cf264c5a9c69e7a7b530f59d4760a9
MD5 c56460bfd56a64fd9cff7c9d84c6f5a0
BLAKE2b-256 c34eae548202cd7072bcf2938b34c506f18ce15258afdd619a14a701e01d5717

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page