Skip to main content

A powerful Playwright-based scraper helper with hot reload

Project description

๐Ÿ•ท๏ธ GA-Scrap

The Ultimate Web Scraping Library
Playwright-powered โ€ข Developer-friendly โ€ข Production-ready

Python Playwright License Status


โœจ What Makes GA-Scrap Special?

๐ŸŽฏ Simple & Powerful

from ga_scrap import SyncGAScrap

with SyncGAScrap() as scraper:
    scraper.goto("https://example.com")
    title = scraper.get_text("h1")
    scraper.screenshot("page.png")

๐Ÿ–๏ธ Error-Resilient Development

# Sandbox mode - errors don't crash!
with SyncGAScrap(sandbox_mode=True) as scraper:
    scraper.click("#might-not-exist")  # Logs error, continues
    scraper.screenshot("still_works.png")  # Still works!

๐Ÿš€ Quick Start

Installation

git clone https://github.com/GrandpaAcademy/GA-Scrap.git
cd GA-Scrap
pip install -r requirements.txt
playwright install

Your First Scraper

from ga_scrap import SyncGAScrap

with SyncGAScrap() as scraper:
    scraper.goto("https://quotes.toscrape.com")
    quotes = scraper.get_all_text(".quote .text")
    print(f"Found {len(quotes)} quotes!")

That's it! No async/await, no complex setup - just simple Python code.

๐Ÿ“š Learn More

Comprehensive documentation and examples:


๐ŸŽฏ Core Features

Feature Description Status
๐Ÿ”„ Dual Interface Both sync and async APIs โœ…
๐Ÿ–๏ธ Sandbox Mode Error-resilient development โœ…
๐ŸŽญ Full Playwright Complete A-Z feature access โœ…
๐Ÿ“ฑ Device Emulation Mobile, tablet, desktop โœ…
๐ŸŒ Network Control Request/response interception โœ…
๐Ÿ“ธ Media Capture Screenshots, PDFs, videos โœ…
๐Ÿ”ง Developer Tools Hot reload, debugging โœ…
๐ŸŽจ Beautiful CLI Colorful command interface โœ…

๐Ÿ“š Complete Web Documentation

๐ŸŒ Visit Our Interactive Documentation Site ๐ŸŒ

Beautiful โ€ข Interactive โ€ข Complete

Documentation Examples API Reference

๐ŸŽฏ Choose Your Learning Path

๐Ÿ‘ถ Beginner

New to web scraping?

๐Ÿ“– Getting Started ๐ŸŽฏ Basic Examples ๐Ÿ”ง Installation Guide

๐Ÿงช Developer

Building scrapers?

๐Ÿ–๏ธ Sandbox Mode ๐Ÿ”„ Sync Interface โšก Hot Reload

๐Ÿš€ Advanced

Need full control?

๐ŸŽญ Playwright API ๐Ÿ”ง Architecture ๐Ÿค Contributing


๐ŸŽจ Interface Options

๐Ÿ”„ Synchronous (Recommended)

Perfect for beginners and most use cases

from ga_scrap import SyncGAScrap

with SyncGAScrap() as scraper:
    scraper.goto("https://example.com")
    data = scraper.get_text(".content")
    scraper.screenshot("result.png")

โšก Asynchronous

For advanced users and high-performance scenarios

import asyncio
from ga_scrap import GAScrap

async def scrape():
    async with GAScrap() as scraper:
        await scraper.goto("https://example.com")
        data = await scraper.get_text(".content")
        await scraper.screenshot("result.png")

asyncio.run(scrape())

๐Ÿ–๏ธ Sandbox Mode

The game-changer for development!

# Traditional scraping - one error stops everything
scraper.click("#button")  # โŒ Element not found โ†’ CRASH!

# GA-Scrap sandbox mode - errors are handled gracefully
with SyncGAScrap(sandbox_mode=True) as scraper:
    scraper.click("#button")  # โŒ Error logged, execution continues
    scraper.screenshot("debug.png")  # โœ… Still works perfectly!

Benefits:

  • ๐Ÿ›ก๏ธ Never crashes - Browser stays active during errors
  • ๐Ÿ“ Detailed logging - Know exactly what went wrong
  • ๐Ÿ”„ Instant recovery - Fix and continue immediately
  • ๐Ÿงช Perfect for testing - Try different approaches safely

๐ŸŽญ Complete Playwright Access

Every Playwright feature from A-Z is available:

# High-level GA-Scrap methods
scraper.goto("https://example.com")
scraper.screenshot("page.png")

# Direct Playwright access when needed
page = scraper.get_playwright_page()
await page.evaluate("document.body.style.background = 'red'")

# Safe method execution with sandbox protection
result = scraper.execute_playwright_method('page', 'title')
๐Ÿ”ค View A-Z Feature List
  • Accessibility testing
  • Browser management
  • Cookies & context
  • Downloads handling
  • Evaluate JavaScript
  • Form interactions
  • Geolocation control
  • Hover & interactions
  • Injection (CSS/JS)
  • JavaScript execution
  • Keyboard simulation
  • Locators & selectors
  • Mouse operations
  • Network monitoring
  • Offline mode
  • PDF generation
  • Query selectors
  • Recording (video/HAR)
  • Screenshots
  • Touch simulation
  • Upload files
  • Viewport control
  • Waiting strategies
  • XPath selectors
  • Yielding control
  • Zone/timezone settings

๐Ÿ› ๏ธ CLI Tools

# Quick scraping
ga-scrap quick "https://example.com" "h1"

# Create new project
ga-scrap new my-scraper

# Development with hot reload
ga-scrap dev

# Run with auto-restart
ga-scrap run script.py

๐ŸŽฏ Examples

๐Ÿ“ฐ News Scraper
with SyncGAScrap() as scraper:
    scraper.goto("https://news.ycombinator.com")
    
    titles = scraper.get_all_text(".titleline > a")
    scores = scraper.get_all_text(".score")
    
    for title, score in zip(titles, scores):
        print(f"{score}: {title}")
๐Ÿ›’ E-commerce Scraper
with SyncGAScrap(sandbox_mode=True) as scraper:
    scraper.goto("https://example-shop.com")
    
    # Handle potential popups gracefully
    scraper.click(".popup-close")  # Won't crash if not found
    
    products = scraper.get_all_text(".product-name")
    prices = scraper.get_all_text(".product-price")
    
    for product, price in zip(products, prices):
        print(f"{product}: {price}")
๐Ÿ“ฑ Mobile Scraping
with SyncGAScrap(device="iPhone 12") as scraper:
    scraper.goto("https://mobile-site.com")
    scraper.simulate_touch(100, 200)
    scraper.screenshot("mobile-view.png")

๐Ÿค Contributing

We love contributions! Check out our Contributing Guide to get started.


๐Ÿ“„ License

MIT License - see LICENSE for details.


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ga_scrap-1.0.0.tar.gz (49.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ga_scrap-1.0.0-py3-none-any.whl (46.1 kB view details)

Uploaded Python 3

File details

Details for the file ga_scrap-1.0.0.tar.gz.

File metadata

  • Download URL: ga_scrap-1.0.0.tar.gz
  • Upload date:
  • Size: 49.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for ga_scrap-1.0.0.tar.gz
Algorithm Hash digest
SHA256 a740269150b5ec76db6a7fa587515a551c50cf74dc006065ff10f482fe65dcf1
MD5 f074e2ac6ba20fe6e05dfa6903898b11
BLAKE2b-256 c1c90435d8a4ddf58f9982e1fa39253abcd64bb38e8cd8a74edc0545978b2956

See more details on using hashes here.

File details

Details for the file ga_scrap-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: ga_scrap-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 46.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for ga_scrap-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fad744684c53227d3370c6bc777c7bb4e454ecbe4c3841df956bcc6e5b601afd
MD5 13065dac5e4ca1f991b2c8ba20698cdf
BLAKE2b-256 624654f58c15ad930bc53ac994e237a0be70ab0565fd28732e5054ef75608168

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page