A Python package for scraping Bing search results
Project description
Scrape Bing
A robust Python package for scraping search results from Bing with built-in rate limiting, retry mechanisms, and result cleaning features.
Features
- 🔍 Clean and structured search results
- 🔄 Automatic retry mechanism for failed requests
- ⏱️ Built-in rate limiting to prevent blocking
- 🧹 URL cleaning and validation
- 🔄 User agent rotation
- 💪 Type hints and proper error handling
- 📝 Comprehensive documentation
Installation
You can install the package using pip:
pip install scrape-bing
For development installation:
git clone https://github.com/affanshaikhsurab/scrape-bing.git
cd scrape_bing
pip install -e .
Quick Start
from scrape_bing import BingScraper
# Initialize the searcher
scraper= BingScraper(
max_retries=3,
delay_between_requests=1.0
)
# Perform a search
results = scraper.search("python programming", num_results=5)
# Process results
for result in results:
print(f"\nTitle: {result.title}")
print(f"URL: {result.url}")
print(f"Description: {result.description}")
Advanced Usage
Custom Configuration
# Configure with custom parameters
scraper = BingScraper(
max_retries=5, # Maximum retry attempts
delay_between_requests=2.0 # Delay between requests in seconds
)
Error Handling
from scrape_bing import BingScraper
scraper = BingScraper()
try:
results = scraper.search("python programming")
except ValueError as e:
print(f"Invalid input: {e}")
except ConnectionError as e:
print(f"Network error: {e}")
except RuntimeError as e:
print(f"Parsing error: {e}")
Search Result Structure
Each search result contains:
title: The title of the search resulturl: The cleaned and validated URLdescription: The description snippet (if available)
# Access result attributes
for result in results:
print(result.title) # Title of the page
print(result.url) # Clean URL
print(result.description) # Description (may be None)
API Reference
BingSearch Class
class BingScraper:
def __init__(self, max_retries: int = 3, delay_between_requests: float = 1.0):
"""
Initialize the BingSearch scraper.
Args:
max_retries: Maximum number of retry attempts for failed requests
delay_between_requests: Minimum delay between requests in seconds
"""
pass
def search(self, query: str, num_results: int = 10) -> List[SearchResult]:
"""
Perform a Bing search and return results.
Args:
query: Search query string
num_results: Maximum number of results to return
Returns:
List of SearchResult objects
Raises:
ValueError: If query is empty
ConnectionError: If network connection fails
RuntimeError: If parsing fails
"""
pass
SearchResult Class
@dataclass
class SearchResult:
title: str # Title of the search result
url: str # Cleaned URL
description: Optional[str] # Description (may be None)
Contributing
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
Running Tests
# Install development dependencies
pip install -e ".[dev]"
# Run tests
python -m pytest tests/
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
- Beautiful Soup 4 for HTML parsing
- Requests library for HTTP requests
- Python typing for type hints
Support
If you encounter any issues or have questions, please file an issue on the GitHub repository.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file scrape_bing-0.1.2.1.tar.gz.
File metadata
- Download URL: scrape_bing-0.1.2.1.tar.gz
- Upload date:
- Size: 5.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ef182d24050c11f5322b1b705b2f5c6918c6edb248ce80dc74e3ac6381f19217
|
|
| MD5 |
57bab0b6feed55f7b32a9ec179a96087
|
|
| BLAKE2b-256 |
0b255073aaef0fb5a2d2ac4d7f7b320d458dccc95c912ee0c89375fa9e32cbec
|
File details
Details for the file scrape_bing-0.1.2.1-py3-none-any.whl.
File metadata
- Download URL: scrape_bing-0.1.2.1-py3-none-any.whl
- Upload date:
- Size: 7.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f57a81f22f288fbe7d4aca095cab43f9969277f50e4c6bafaa1375ee0cac75f5
|
|
| MD5 |
239360586f78610481285529839cf0b4
|
|
| BLAKE2b-256 |
d13edc49dc5734712b2dab9d031af65df70c343d8937d4a9d6612223013605ec
|