Basic Web Scraper made with selenium and bs4
Project description
Basic Web Scraper
Project Description
This package can be used for simple automated web surfing / scraping.
Additionally, the included BasicSpider class is meant to be extended by inheritance.
Usage Example
from basic_web_scraper.BasicSpider import BasicSpider
class CustomSpider(BasicSpider):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
def custom_operation(self, threshold):
"""
Scroll to predefined threshold.
If past threshold, scroll back up.
"""
if self.get_page_y_offset() < threshold:
self.mousewheel_vscroll(number_of_scrolls=2)
else:
y_difference = self.get_page_y_offset() - threshold
self.smooth_vscroll_up_by(y_difference)
Details about files
geckodriver.exe | geckodriver
Keep this file in local directory when using scraper on windows. If you're using a linux system, make sure to include geckodriver in PATH variable. Downloaded from here
Note: geckodriver.exe is for windows, and geckodriver (no extension) is for Linux
BasicSpider.py
Use this as the superclass for your own project's spider
This Spider can do basic things like goto a url, scroll down the page in different ways, refresh the page, etc..
It acts as an interface to selenium.webdriver to make setting up a project easier
custom_exceptions.py
Copy alongside BasicSpider.py, add any custom exceptions to this file.
package.json | package-lock.json
used to install dependencies related to testing and its related workflows. not needed for the spider to function.
start_local_server.py
used for testing. optional.
tests.py
this is where all the tests for the scraper are done. optional.
mock_webpage
used for testing. optional. can be used as a simple demo.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for basic-web-scraper-0.12.0a3.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8b5e198678eac2cc0d1ed1bbb00987c546298b5c3ee9022fc33aa631d46bafd1 |
|
MD5 | dbd4863d05880e8b72bd83705a0ab14a |
|
BLAKE2b-256 | a6a0f92887b2fa2d7e28937883b56eb972275c2272f45f4ff9ed1fbbab3f54e6 |
Hashes for basic_web_scraper-0.12.0a3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 54bd4f03ea64e87dd00b7c66093196ca1a1635316169a2cb21fed2477a67c339 |
|
MD5 | 474150146e993ea768d516085ae878a1 |
|
BLAKE2b-256 | 147fd24e8f18783c14e60ce2f006cda021a85291e1c9f78dda5644682ce80c4a |