Fast, lightweight Google search scraper with stealth mode

These details have not been verified by PyPI

Project links

Project description

Google Search Scraper

A fast, lightweight, and easy-to-use Python package for scraping Google search results with built-in stealth mode to avoid detection.

✨ Features

🚀 Fast: Optimized for speed with minimal overhead
🥷 Stealth Mode: Built-in anti-detection features
🎯 Simple API: Easy to use for both beginners and experts
📦 Zero Config: Playwright browser auto-installs on package installation
🔧 Flexible: Highly configurable with sensible defaults
💻 CLI Support: Use from command line or as a Python module
🎨 Multiple Output Formats: JSON and text output supported
📄 Content Extraction: Extract and analyze full page content from search results
💾 Auto-Save: Automatically save results to file with full content
🧹 Clean Text: Intelligent HTML parsing and text extraction

📦 Installation

pip install google-search-aj

The package will automatically install Playwright and download the Chromium browser during installation.

If automatic installation fails, manually install Playwright browsers:

playwright install chromium

🚀 Quick Start

Python API

from google_search_scraper import search

# Simple search
results = search("python tutorial")
print(results.urls)
# ['https://docs.python.org/3/tutorial/', 'https://www.w3schools.com/python/', ...]

# Access the direct answer (if available)
print(results.answer)
# 'Python is a high-level, general-purpose programming language...'

# Get more details
print(f"Found {results.total_results} results in {results.search_time:.2f} seconds")

# Extract full page content
results = search("machine learning", max_results=3, extract_content=True)
for content in results.contents:
    print(f"{content.title}: {content.word_count} words")
    print(f"Preview: {content.content[:200]}...")

# Auto-save to file with content extraction
results = search(
    "python tutorial", 
    extract_content=True, 
    save_to_file=True,  # Auto-save enabled
    output_file="search_results.txt"  # Custom filename
)
# Results automatically saved with full content!

Command Line

# Simple search
google-search "python tutorial"

# Limit results
google-search "best restaurants" --max-results 20

# Save to file
google-search "machine learning" --output results.txt

# JSON output
google-search "data science" --format json

# Run with visible browser (debugging)
google-search "web scraping" --visible

📖 Usage Examples

Basic Usage

from google_search_scraper import search

# Default: 10 results with answer extraction
results = search("artificial intelligence")

for i, url in enumerate(results.urls, 1):
    print(f"{i}. {url}")

Advanced Usage

from google_search_scraper import GoogleSearchScraper

# Create a scraper with custom settings
scraper = GoogleSearchScraper(
    max_results=20,          # Get more results
    timeout=60000,           # Increase timeout
    headless=False,          # Show browser (for debugging)
    stealth_mode=True,       # Enable anti-detection (default)
    extract_content=True     # Extract page content
)

# Perform search
results = scraper.search("python web scraping", extract_answer=True)

# Access results
print(f"Query: {results.query}")
print(f"Time: {results.search_time:.2f}s")
print(f"Answer: {results.answer}")
print(f"URLs: {len(results.urls)}")
print(f"Content extracted: {len(results.contents)} pages")

# Convert to dictionary
data = results.to_dict()

Content Extraction

from google_search_scraper import search

# Extract content from search results
results = search("machine learning tutorial", max_results=5, extract_content=True)

# Access extracted content
for content in results.contents:
    if not content.error:
        print(f"\nTitle: {content.title}")
        print(f"URL: {content.url}")
        print(f"Word Count: {content.word_count:,}")
        print(f"Content Preview: {content.content[:200]}...")
    else:
        print(f"Failed to extract: {content.url} - {content.error}")

# Auto-save to file
results = search(
    "python web scraping",
    extract_content=True,
    save_to_file=True,  # Auto-save enabled
    output_file="my_search.txt"
)
print(f"✓ Results saved with full content!")

# Or save manually
results = search("AI tutorial", extract_content=True)
results.save_to_file("ai_results.txt")

Multiple Searches

from google_search_scraper import search

queries = ["python", "javascript", "rust"]

for query in queries:
    results = search(query, max_results=5)
    print(f"\n{query}: {len(results.urls)} results")
    print(results.urls[0] if results.urls else "No results")

Error Handling

from google_search_scraper import search
from google_search_scraper.exceptions import (
    GoogleSearchError,
    RateLimitError,
    BrowserError,
    SearchTimeoutError
)

try:
    results = search("test query", timeout=10000)
except RateLimitError:
    print("Being rate limited by Google. Try again later.")
except SearchTimeoutError:
    print("Search timed out. Try increasing the timeout.")
except BrowserError:
    print("Browser failed to launch.")
except GoogleSearchError as e:
    print(f"Search failed: {e}")

Batch Processing

from google_search_scraper import search
import time

queries = [
    "machine learning",
    "deep learning",
    "neural networks"
]

all_results = []

for query in queries:
    print(f"Searching: {query}")
    results = search(query, max_results=15)
    all_results.append(results)
    
    # Be respectful - add delay between searches
    time.sleep(5)

# Process results
for result in all_results:
    print(f"\n{result.query}:")
    print(f"  - Answer: {result.answer[:100] if result.answer else 'N/A'}")
    print(f"  - URLs: {len(result.urls)}")

🎯 API Reference

Main Functions

`search(query, max_results=10, extract_answer=True, extract_content=False, headless=True, timeout=30000, save_to_file=False, output_file="search_results.txt")`

Convenience function for quick searches.

Parameters:

query (str): Search query string
max_results (int): Maximum URLs to return (default: 10)
extract_answer (bool): Extract Google's direct answer (default: True)
extract_content (bool): Extract page content from URLs (default: False)
headless (bool): Run browser in headless mode (default: True)
timeout (int): Page load timeout in milliseconds (default: 30000)
save_to_file (bool): Automatically save results to file (default: False)
output_file (str): Name of the output file (default: search_results.txt)

Returns: SearchResult object

Classes

`GoogleSearchScraper`

Main scraper class with configurable options.

scraper = GoogleSearchScraper(
    max_results=10,
    timeout=30000,
    headless=True,
    stealth_mode=True,
    user_agent=None,
    extract_content=False
)

Methods:

search(query, extract_answer=True): Perform a search

`SearchResult`

Container for search results.

Attributes:

query (str): The search query
answer (str | None): Google's direct answer if available
urls (List[str]): List of result URLs
total_results (int): Number of URLs returned
search_time (float): Time taken for search in seconds
timestamp (float): Unix timestamp of search
contents (List[PageContent]): Extracted page content (if extract_content=True)

Methods:

to_dict(): Convert to dictionary
save_to_file(filename="search_results.txt"): Save results to text file with full content

`PageContent`

Container for extracted page content.

Attributes:

url (str): The page URL
title (str | None): Page title
content (str): Extracted clean text content
word_count (int): Number of words in content
error (str | None): Error message if extraction failed

Exceptions

GoogleSearchError: Base exception for all errors
RateLimitError: Raised when rate limited by Google
BrowserError: Raised when browser fails
SearchTimeoutError: Raised when search times out
NoResultsError: Raised when no results found

🎛️ CLI Reference

usage: google-search [-h] [-n N] [--no-answer] [--visible] [--timeout MS]
                     [-o FILE] [-f {text,json}] [-v] [-q]
                     [query]

positional arguments:
  query                 Search query (if not provided, enters interactive mode)

optional arguments:
  -h, --help            show this help message and exit
  -n N, --max-results N
                        Maximum number of results to return (default: 10)
  --no-answer           Skip extracting Google's direct answer
  --visible             Run browser in visible mode (for debugging)
  --timeout MS          Page load timeout in milliseconds (default: 30000)
  -o FILE, --output FILE
                        Save results to file
  -f {text,json}, --format {text,json}
                        Output format: text or json (default: text)
  -v, --version         show program's version number and exit
  -q, --quiet           Suppress all output except results

⚠️ Important Notes

Rate Limiting

Google may rate limit or block requests if you make too many searches too quickly. To avoid this:

Add delays between searches (5-10 seconds recommended)
Use residential proxies for large-scale scraping
Respect robots.txt and Google's Terms of Service

Legal Considerations

Web scraping may have legal implications. This tool is for educational and research purposes. Users are responsible for:

Complying with Google's Terms of Service
Respecting robots.txt
Following applicable laws and regulations
Not using for commercial purposes without permission

Detection

While this package includes stealth features, Google continuously updates their detection methods. If you're being blocked:

Increase delays between requests
Use headless=False to run in visible mode
Consider using residential proxies
Rotate user agents

🛠️ Development

Setup Development Environment

# Clone the repository
git clone https://github.com/Aaditya17032002/google-search.git
cd google-search

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install in development mode
pip install -e ".[dev]"

# Install Playwright browsers
playwright install chromium

Running Tests

pytest tests/

Code Formatting

black google_search_scraper/
flake8 google_search_scraper/

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Built with Playwright
Inspired by the need for a simple, reliable Google search scraper

📧 Support

If you encounter any issues or have questions:

Open an issue on GitHub
Check existing issues for solutions

🔄 Changelog

v1.0.4 (2024-11-03)

Auto-save to file: Save results with full content automatically
Manual save with save_to_file() method
Comprehensive file format with all data

v1.0.3 (2024-11-03)

Content extraction: Extract full page content from URLs
BeautifulSoup4 integration for HTML parsing
Async content extraction for better performance
PageContent dataclass with title, content, word count

v1.0.0 (2024-11-03)

Initial release
Fast Google search scraping
Stealth mode with anti-detection
CLI and Python API
Auto-installation of Playwright
JSON and text output formats

See CHANGELOG.md for full version history.

Made with ❤️ by developers, for developers

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.6

Nov 3, 2025

1.0.5

Nov 3, 2025

1.0.4

Nov 3, 2025

1.0.3

Nov 3, 2025

1.0.2

Nov 3, 2025

1.0.1

Nov 3, 2025

1.0.0

Nov 3, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

google_search_aj-1.0.6.tar.gz (16.5 kB view details)

Uploaded Nov 3, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

google_search_aj-1.0.6-py3-none-any.whl (15.5 kB view details)

Uploaded Nov 3, 2025 Python 3

File details

Details for the file google_search_aj-1.0.6.tar.gz.

File metadata

Download URL: google_search_aj-1.0.6.tar.gz
Upload date: Nov 3, 2025
Size: 16.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.0

File hashes

Hashes for google_search_aj-1.0.6.tar.gz
Algorithm	Hash digest
SHA256	`fa619837d9997bea093670ed6d0cc7f086d8a860aefb862082d281fb5b98120d`
MD5	`0fded2fe00ca98d74549ae715771d382`
BLAKE2b-256	`2d5a14a38e41f8d225123e04f30c84378213857f9f2bd52dcbbbc8946f8c7d06`

See more details on using hashes here.

File details

Details for the file google_search_aj-1.0.6-py3-none-any.whl.

File metadata

Download URL: google_search_aj-1.0.6-py3-none-any.whl
Upload date: Nov 3, 2025
Size: 15.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.0

File hashes

Hashes for google_search_aj-1.0.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e75d4949d7dcb4dc1ea306633df90d56b020eead8946bd8fc669792bc294fce9`
MD5	`6f077d2b1d8e16dabf965d9a2372b8bf`
BLAKE2b-256	`3e69764c4feec9ce1f1c0242f641fa6bb95184de5dcf144f1a736a9abb2a7995`

See more details on using hashes here.

google-search-aj 1.0.6

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Google Search Scraper

✨ Features

📦 Installation

🚀 Quick Start

Python API

Command Line

📖 Usage Examples

Basic Usage

Advanced Usage

Content Extraction

Multiple Searches

Error Handling

Batch Processing

🎯 API Reference

Main Functions

search(query, max_results=10, extract_answer=True, extract_content=False, headless=True, timeout=30000, save_to_file=False, output_file="search_results.txt")

Classes

GoogleSearchScraper

SearchResult

PageContent

Exceptions

🎛️ CLI Reference

⚠️ Important Notes

Rate Limiting

Legal Considerations

Detection

🛠️ Development

Setup Development Environment

Running Tests

Code Formatting

🤝 Contributing

📝 License

🙏 Acknowledgments

📧 Support

🔄 Changelog

v1.0.4 (2024-11-03)

v1.0.3 (2024-11-03)

v1.0.0 (2024-11-03)

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`search(query, max_results=10, extract_answer=True, extract_content=False, headless=True, timeout=30000, save_to_file=False, output_file="search_results.txt")`

`GoogleSearchScraper`

`SearchResult`

`PageContent`