Fast, lightweight Google search scraper with stealth mode
Project description
Google Search Scraper
A fast, lightweight, and easy-to-use Python package for scraping Google search results with built-in stealth mode to avoid detection.
✨ Features
- 🚀 Fast: Optimized for speed with minimal overhead
- 🥷 Stealth Mode: Built-in anti-detection features
- 🎯 Simple API: Easy to use for both beginners and experts
- 📦 Zero Config: Playwright browser auto-installs on package installation
- 🔧 Flexible: Highly configurable with sensible defaults
- 💻 CLI Support: Use from command line or as a Python module
- 🎨 Multiple Output Formats: JSON and text output supported
📦 Installation
pip install google-search-scraper
The package will automatically install Playwright and download the Chromium browser during installation.
If automatic installation fails, manually install Playwright browsers:
playwright install chromium
🚀 Quick Start
Python API
from google_search_scraper import search
# Simple search
results = search("python tutorial")
print(results.urls)
# ['https://docs.python.org/3/tutorial/', 'https://www.w3schools.com/python/', ...]
# Access the direct answer (if available)
print(results.answer)
# 'Python is a high-level, general-purpose programming language...'
# Get more details
print(f"Found {results.total_results} results in {results.search_time:.2f} seconds")
Command Line
# Simple search
google-search "python tutorial"
# Limit results
google-search "best restaurants" --max-results 20
# Save to file
google-search "machine learning" --output results.txt
# JSON output
google-search "data science" --format json
# Run with visible browser (debugging)
google-search "web scraping" --visible
📖 Usage Examples
Basic Usage
from google_search_scraper import search
# Default: 10 results with answer extraction
results = search("artificial intelligence")
for i, url in enumerate(results.urls, 1):
print(f"{i}. {url}")
Advanced Usage
from google_search_scraper import GoogleSearchScraper
# Create a scraper with custom settings
scraper = GoogleSearchScraper(
max_results=20, # Get more results
timeout=60000, # Increase timeout
headless=False, # Show browser (for debugging)
stealth_mode=True # Enable anti-detection (default)
)
# Perform search
results = scraper.search("python web scraping", extract_answer=True)
# Access results
print(f"Query: {results.query}")
print(f"Time: {results.search_time:.2f}s")
print(f"Answer: {results.answer}")
print(f"URLs: {len(results.urls)}")
# Convert to dictionary
data = results.to_dict()
Multiple Searches
from google_search_scraper import search
queries = ["python", "javascript", "rust"]
for query in queries:
results = search(query, max_results=5)
print(f"\n{query}: {len(results.urls)} results")
print(results.urls[0] if results.urls else "No results")
Error Handling
from google_search_scraper import search
from google_search_scraper.exceptions import (
GoogleSearchError,
RateLimitError,
BrowserError,
SearchTimeoutError
)
try:
results = search("test query", timeout=10000)
except RateLimitError:
print("Being rate limited by Google. Try again later.")
except SearchTimeoutError:
print("Search timed out. Try increasing the timeout.")
except BrowserError:
print("Browser failed to launch.")
except GoogleSearchError as e:
print(f"Search failed: {e}")
Batch Processing
from google_search_scraper import search
import time
queries = [
"machine learning",
"deep learning",
"neural networks"
]
all_results = []
for query in queries:
print(f"Searching: {query}")
results = search(query, max_results=15)
all_results.append(results)
# Be respectful - add delay between searches
time.sleep(5)
# Process results
for result in all_results:
print(f"\n{result.query}:")
print(f" - Answer: {result.answer[:100] if result.answer else 'N/A'}")
print(f" - URLs: {len(result.urls)}")
🎯 API Reference
Main Functions
search(query, max_results=10, extract_answer=True, headless=True, timeout=30000)
Convenience function for quick searches.
Parameters:
query(str): Search query stringmax_results(int): Maximum URLs to return (default: 10)extract_answer(bool): Extract Google's direct answer (default: True)headless(bool): Run browser in headless mode (default: True)timeout(int): Page load timeout in milliseconds (default: 30000)
Returns: SearchResult object
Classes
GoogleSearchScraper
Main scraper class with configurable options.
scraper = GoogleSearchScraper(
max_results=10,
timeout=30000,
headless=True,
stealth_mode=True,
user_agent=None
)
Methods:
search(query, extract_answer=True): Perform a search
SearchResult
Container for search results.
Attributes:
query(str): The search queryanswer(str | None): Google's direct answer if availableurls(List[str]): List of result URLstotal_results(int): Number of URLs returnedsearch_time(float): Time taken for search in secondstimestamp(float): Unix timestamp of search
Methods:
to_dict(): Convert to dictionary
Exceptions
GoogleSearchError: Base exception for all errorsRateLimitError: Raised when rate limited by GoogleBrowserError: Raised when browser failsSearchTimeoutError: Raised when search times outNoResultsError: Raised when no results found
🎛️ CLI Reference
usage: google-search [-h] [-n N] [--no-answer] [--visible] [--timeout MS]
[-o FILE] [-f {text,json}] [-v] [-q]
[query]
positional arguments:
query Search query (if not provided, enters interactive mode)
optional arguments:
-h, --help show this help message and exit
-n N, --max-results N
Maximum number of results to return (default: 10)
--no-answer Skip extracting Google's direct answer
--visible Run browser in visible mode (for debugging)
--timeout MS Page load timeout in milliseconds (default: 30000)
-o FILE, --output FILE
Save results to file
-f {text,json}, --format {text,json}
Output format: text or json (default: text)
-v, --version show program's version number and exit
-q, --quiet Suppress all output except results
⚠️ Important Notes
Rate Limiting
Google may rate limit or block requests if you make too many searches too quickly. To avoid this:
- Add delays between searches (5-10 seconds recommended)
- Use residential proxies for large-scale scraping
- Respect robots.txt and Google's Terms of Service
Legal Considerations
Web scraping may have legal implications. This tool is for educational and research purposes. Users are responsible for:
- Complying with Google's Terms of Service
- Respecting robots.txt
- Following applicable laws and regulations
- Not using for commercial purposes without permission
Detection
While this package includes stealth features, Google continuously updates their detection methods. If you're being blocked:
- Increase delays between requests
- Use
headless=Falseto run in visible mode - Consider using residential proxies
- Rotate user agents
🛠️ Development
Setup Development Environment
# Clone the repository
git clone https://github.com/yourusername/google-search-scraper.git
cd google-search-scraper
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install in development mode
pip install -e ".[dev]"
# Install Playwright browsers
playwright install chromium
Running Tests
pytest tests/
Code Formatting
black google_search_scraper/
flake8 google_search_scraper/
🤝 Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
📝 License
This project is licensed under the MIT License - see the LICENSE file for details.
🙏 Acknowledgments
- Built with Playwright
- Inspired by the need for a simple, reliable Google search scraper
📧 Support
If you encounter any issues or have questions:
- Open an issue on GitHub
- Check existing issues for solutions
🔄 Changelog
v1.0.0 (2024-11-03)
- Initial release
- Fast Google search scraping
- Stealth mode with anti-detection
- CLI and Python API
- Auto-installation of Playwright
- JSON and text output formats
Made with ❤️ by developers, for developers
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file google_search_aj-1.0.2.tar.gz.
File metadata
- Download URL: google_search_aj-1.0.2.tar.gz
- Upload date:
- Size: 14.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
32b757fd2e38fcdb09e344bd7dac20a67cc62a6a44ace00e0b14f0212ce5ae7b
|
|
| MD5 |
57decbf742511a5d090fd66b233b3086
|
|
| BLAKE2b-256 |
e6e4cf090d4065283b286035ef3ed62d7ab995b2854ead3f7641c01656085a25
|
File details
Details for the file google_search_aj-1.0.2-py3-none-any.whl.
File metadata
- Download URL: google_search_aj-1.0.2-py3-none-any.whl
- Upload date:
- Size: 13.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bf11ec09cc66cf84d10fa782b4b01d8dae7b12136067e6ca81e2390599bd1c8d
|
|
| MD5 |
c3a3ef6205196b045e79f5af051d7d7d
|
|
| BLAKE2b-256 |
e7382d5e332ac33747b235812eac757449a7ef58bccb248b27aae1192ab91200
|