Skip to main content

ScrapeGraph Python SDK for API

Project description

ScrapeGraph Python SDK

The official Python SDK for interacting with the ScrapeGraph AI API - a powerful web scraping and data extraction service.

Installation

Install the package using pip:

pip install scrapegraph-py

Authentication

To use the ScrapeGraph API, you'll need an API key. You can manage this in two ways:

  1. Environment variable:
export SCRAPEGRAPH_API_KEY="your-api-key-here"
  1. .env file:
SCRAPEGRAPH_API_KEY="your-api-key-here"

Features

The SDK provides four main functionalities:

  1. Web Scraping (basic and structured)
  2. Credits checking
  3. Feedback submission
  4. API status checking

Usage

Basic Web Scraping

from scrapegraph_py import ScrapeGraphClient, scrape
from dotenv import load_dotenv

load_dotenv()
api_key = os.getenv("SCRAPEGRAPH_API_KEY")
client = ScrapeGraphClient(api_key)

url = "https://scrapegraphai.com/"
prompt = "What does the company do?"

result = scrape(client, url, prompt)
print(result)

Local HTML Scraping

You can also scrape content from local HTML files:

from scrapegraph_py import ScrapeGraphClient, scrape_text
from bs4 import BeautifulSoup

def scrape_local_html(client: ScrapeGraphClient, file_path: str, prompt: str):
    with open(file_path, 'r', encoding='utf-8') as file:
        html_content = file.read()
    
    # Use BeautifulSoup to extract text content
    soup = BeautifulSoup(html_content, 'html.parser')
    text_content = soup.get_text(separator='\n', strip=True)
    
    # Use ScrapeGraph AI to analyze the text
    return scrape_text(client, text_content, prompt)

# Usage
client = ScrapeGraphClient(api_key)
result = scrape_local_html(
    client,
    'sample.html',
    "Extract main content and important information"
)
print("Extracted Data:", result)

Structured Data Extraction

For more structured data extraction, you can define a Pydantic schema:

from pydantic import BaseModel, Field
from scrapegraph_py import scrape

class CompanyInfoSchema(BaseModel):
    company_name: str = Field(description="The name of the company")
    description: str = Field(description="A description of the company")
    main_products: list[str] = Field(description="The main products of the company")

# Scrape with schema
result = scrape(
    api_key=api_key,
    url="https://scrapegraphai.com/",
    prompt="What does the company do?",
    schema=CompanyInfoSchema
)
print(result)

Check Credits

Monitor your API usage:

from scrapegraph_py import credits

response = credits(api_key)
print(response)

Provide Feedback and Check Status

You can provide feedback on scraping results and check the API status:

from scrapegraph_py import feedback, status

# Check API status
status_response = status(api_key)
print(f"API Status: {status_response}")

# Submit feedback
feedback_response = feedback(
    api_key=api_key,
    request_id="your-request-id",  # UUID from your scraping request
    rating=5,  # Rating from 1-5
    message="Great results!"
)
print(f"Feedback Response: {feedback_response}")

Development

Requirements

  • Python 3.9+
  • Rye for dependency management (optional)

Project Structure

scrapegraph_py/
├── __init__.py
├── credits.py      # Credits checking functionality
├── scrape.py      # Core scraping functionality
└── feedback.py    # Feedback submission functionality

examples/
├── credits_example.py
├── feedback_example.py
├── scrape_example.py
└── scrape_schema_example.py

tests/
├── test_credits.py
├── test_feedback.py
└── test_scrape.py

Setting up the Development Environment

  1. Clone the repository:
git clone https://github.com/yourusername/scrapegraph-py.git
cd scrapegraph-py
  1. Install dependencies:
# If using Rye
rye sync

# If using pip
pip install -r requirements-dev.lock
  1. Create a .env file in the root directory:
SCRAPEGRAPH_API_KEY="your-api-key-here"

License

This project is licensed under the MIT License.

Support

For support:

  • Visit ScrapeGraph AI
  • Contact our support team
  • Check the examples in the examples/ directory

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapegraph_py-0.0.3.tar.gz (11.7 kB view details)

Uploaded Source

Built Distribution

scrapegraph_py-0.0.3-py3-none-any.whl (8.7 kB view details)

Uploaded Python 3

File details

Details for the file scrapegraph_py-0.0.3.tar.gz.

File metadata

  • Download URL: scrapegraph_py-0.0.3.tar.gz
  • Upload date:
  • Size: 11.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.12.3

File hashes

Hashes for scrapegraph_py-0.0.3.tar.gz
Algorithm Hash digest
SHA256 80a8956c1f1d041b58e13c3fec2344c51b4424fda9570a06f998e9123ff56de6
MD5 b7fd0c24b23b10997fd25c842fc49bdb
BLAKE2b-256 ab2bb32015f29417041d4d0d8dac78da9e763813483e1b74ad9f1e53b06d7991

See more details on using hashes here.

File details

Details for the file scrapegraph_py-0.0.3-py3-none-any.whl.

File metadata

File hashes

Hashes for scrapegraph_py-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 b270a13c22b728d39728e179e448c3ea8d70d5819c328fc2cac8077a36ed53f2
MD5 a6de73adf05ce392fdd20f48c2dabd0b
BLAKE2b-256 26acc267e1d0c0ddb9d857732a53e760437ca93a2c7bf6ac8f1309ee4f069b96

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page