A powerful Playwright-based scraper helper with hot reload
Project description
๐ท๏ธ GA-Scrap
โจ What Makes GA-Scrap Special?
๐ฏ Simple & Powerfulfrom ga_scrap import SyncGAScrap
with SyncGAScrap() as scraper:
scraper.goto("https://example.com")
title = scraper.get_text("h1")
scraper.screenshot("page.png")
|
๐๏ธ Error-Resilient Development# Sandbox mode - errors don't crash!
with SyncGAScrap(sandbox_mode=True) as scraper:
scraper.click("#might-not-exist") # Logs error, continues
scraper.screenshot("still_works.png") # Still works!
|
๐ Quick Start
Installation
git clone https://github.com/GrandpaAcademy/GA-Scrap.git
cd GA-Scrap
pip install -r requirements.txt
playwright install
Your First Scraper
from ga_scrap import SyncGAScrap
with SyncGAScrap() as scraper:
scraper.goto("https://quotes.toscrape.com")
quotes = scraper.get_all_text(".quote .text")
print(f"Found {len(quotes)} quotes!")
That's it! No async/await, no complex setup - just simple Python code.
๐ Learn More
Comprehensive documentation and examples:
๐ฏ Core Features
| Feature | Description | Status |
|---|---|---|
| ๐ Dual Interface | Both sync and async APIs | โ |
| ๐๏ธ Sandbox Mode | Error-resilient development | โ |
| ๐ญ Full Playwright | Complete A-Z feature access | โ |
| ๐ฑ Device Emulation | Mobile, tablet, desktop | โ |
| ๐ Network Control | Request/response interception | โ |
| ๐ธ Media Capture | Screenshots, PDFs, videos | โ |
| ๐ง Developer Tools | Hot reload, debugging | โ |
| ๐จ Beautiful CLI | Colorful command interface | โ |
๐ Complete Web Documentation
๐ฏ Choose Your Learning Path
๐ถ BeginnerNew to web scraping? ๐ Getting Started ๐ฏ Basic Examples ๐ง Installation Guide |
๐งช DeveloperBuilding scrapers? ๐๏ธ Sandbox Mode ๐ Sync Interface โก Hot Reload |
๐ AdvancedNeed full control? ๐ญ Playwright API ๐ง Architecture ๐ค Contributing |
๐จ Interface Options
๐ Synchronous (Recommended)
Perfect for beginners and most use cases
from ga_scrap import SyncGAScrap
with SyncGAScrap() as scraper:
scraper.goto("https://example.com")
data = scraper.get_text(".content")
scraper.screenshot("result.png")
โก Asynchronous
For advanced users and high-performance scenarios
import asyncio
from ga_scrap import GAScrap
async def scrape():
async with GAScrap() as scraper:
await scraper.goto("https://example.com")
data = await scraper.get_text(".content")
await scraper.screenshot("result.png")
asyncio.run(scrape())
๐๏ธ Sandbox Mode
The game-changer for development!
# Traditional scraping - one error stops everything
scraper.click("#button") # โ Element not found โ CRASH!
# GA-Scrap sandbox mode - errors are handled gracefully
with SyncGAScrap(sandbox_mode=True) as scraper:
scraper.click("#button") # โ Error logged, execution continues
scraper.screenshot("debug.png") # โ
Still works perfectly!
Benefits:
- ๐ก๏ธ Never crashes - Browser stays active during errors
- ๐ Detailed logging - Know exactly what went wrong
- ๐ Instant recovery - Fix and continue immediately
- ๐งช Perfect for testing - Try different approaches safely
๐ญ Complete Playwright Access
Every Playwright feature from A-Z is available:
# High-level GA-Scrap methods
scraper.goto("https://example.com")
scraper.screenshot("page.png")
# Direct Playwright access when needed
page = scraper.get_playwright_page()
await page.evaluate("document.body.style.background = 'red'")
# Safe method execution with sandbox protection
result = scraper.execute_playwright_method('page', 'title')
๐ค View A-Z Feature List
- Accessibility testing
- Browser management
- Cookies & context
- Downloads handling
- Evaluate JavaScript
- Form interactions
- Geolocation control
- Hover & interactions
- Injection (CSS/JS)
- JavaScript execution
- Keyboard simulation
- Locators & selectors
- Mouse operations
- Network monitoring
- Offline mode
- PDF generation
- Query selectors
- Recording (video/HAR)
- Screenshots
- Touch simulation
- Upload files
- Viewport control
- Waiting strategies
- XPath selectors
- Yielding control
- Zone/timezone settings
๐ ๏ธ CLI Tools
# Quick scraping
ga-scrap quick "https://example.com" "h1"
# Create new project
ga-scrap new my-scraper
# Development with hot reload
ga-scrap dev
# Run with auto-restart
ga-scrap run script.py
๐ฏ Examples
๐ฐ News Scraper
with SyncGAScrap() as scraper:
scraper.goto("https://news.ycombinator.com")
titles = scraper.get_all_text(".titleline > a")
scores = scraper.get_all_text(".score")
for title, score in zip(titles, scores):
print(f"{score}: {title}")
๐ E-commerce Scraper
with SyncGAScrap(sandbox_mode=True) as scraper:
scraper.goto("https://example-shop.com")
# Handle potential popups gracefully
scraper.click(".popup-close") # Won't crash if not found
products = scraper.get_all_text(".product-name")
prices = scraper.get_all_text(".product-price")
for product, price in zip(products, prices):
print(f"{product}: {price}")
๐ฑ Mobile Scraping
with SyncGAScrap(device="iPhone 12") as scraper:
scraper.goto("https://mobile-site.com")
scraper.simulate_touch(100, 200)
scraper.screenshot("mobile-view.png")
๐ค Contributing
We love contributions! Check out our Contributing Guide to get started.
๐ License
MIT License - see LICENSE for details.
Made with โค๏ธ by Grandpa Academy
โญ Star us on GitHub โข ๐ Read the Docs โข ๐ See Examples โข ๐ Report Issues
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ga_scrap-1.0.0.tar.gz.
File metadata
- Download URL: ga_scrap-1.0.0.tar.gz
- Upload date:
- Size: 49.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a740269150b5ec76db6a7fa587515a551c50cf74dc006065ff10f482fe65dcf1
|
|
| MD5 |
f074e2ac6ba20fe6e05dfa6903898b11
|
|
| BLAKE2b-256 |
c1c90435d8a4ddf58f9982e1fa39253abcd64bb38e8cd8a74edc0545978b2956
|
File details
Details for the file ga_scrap-1.0.0-py3-none-any.whl.
File metadata
- Download URL: ga_scrap-1.0.0-py3-none-any.whl
- Upload date:
- Size: 46.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fad744684c53227d3370c6bc777c7bb4e454ecbe4c3841df956bcc6e5b601afd
|
|
| MD5 |
13065dac5e4ca1f991b2c8ba20698cdf
|
|
| BLAKE2b-256 |
624654f58c15ad930bc53ac994e237a0be70ab0565fd28732e5054ef75608168
|