A versatile web scraping library with multiple techniques
Project description
ScrapeMaster
ScrapeMaster is a comprehensive Python library for web scraping that handles both simple and complex websites, offering features like text and image extraction, session management, and anti-bot circumvention techniques.
Features
- Scrape text and images from websites
- Handle JavaScript-rendered content using Selenium
- Manage cookies and sessions for authenticated scraping
- Rotate user agents and use proxies to avoid detection
- Clean and format extracted data
Installation
You can install ScrapeMaster using pip:
pip install ScrapeMaster
Quick Start
Here's a simple example of how to use ScrapeMaster:
from scrapemaster import ScrapeMaster
scraper = ScrapeMaster('https://example.com')
results = scraper.scrape_all('p', 'img', 'output_images')
print(results['texts'])
print(results['image_urls'])
Advanced Usage
For more advanced usage, including handling of JavaScript-rendered content and authenticated scraping, please refer to the documentation.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for scrapemaster-0.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 44819bab1caa4f05bac74156c75938d4833bbdd0ffc69bd647821f46594fc9d5 |
|
MD5 | f99b8a5deb8b00dc83e428262dff7f40 |
|
BLAKE2b-256 | 61aa8b72ff507766a9e071304450cb4608121dd3ebfc4920f5590d427b78a402 |