Skip to main content

A versatile web scraping library with multiple techniques

Project description

ScrapeMaster

ScrapeMaster is a comprehensive Python library for web scraping that handles both simple and complex websites, offering features like text and image extraction, session management, and anti-bot circumvention techniques.

Features

  • Scrape text and images from websites
  • Handle JavaScript-rendered content using Selenium
  • Manage cookies and sessions for authenticated scraping
  • Rotate user agents and use proxies to avoid detection
  • Clean and format extracted data

Installation

You can install ScrapeMaster using pip:

pip install ScrapeMaster

Quick Start

Here's a simple example of how to use ScrapeMaster:

from scrapemaster import ScrapeMaster

scraper = ScrapeMaster('https://example.com')
results = scraper.scrape_all('p', 'img', 'output_images')
print(results['texts'])
print(results['image_urls'])

Advanced Usage

For more advanced usage, including handling of JavaScript-rendered content and authenticated scraping, please refer to the documentation.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapemaster-0.1.4.tar.gz (10.7 kB view hashes)

Uploaded Source

Built Distribution

scrapemaster-0.1.4-py3-none-any.whl (10.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page