A simple web scraper base combining Beautiful Soup and Selenium
Project description
SouperScraper
A simple web scraper base that combines BeautifulSoup and Selenium to scrape dynamic websites.
Setup
- Install with pip
pip install souperscraper
- Download the appropriate ChromeDriver for your Chrome version using getchromedriver.py (command below) or manually from the ChromeDriver website.
To find your Chrome version, go to
chrome://settings/help
in your browser.
getchromedriver
- Create a new SouperScaper object using the path to your ChromeDriver
from souperscraper import SouperScraper
scraper = SouperScraper('/path/to/your/chromedriver')
- Start scraping using BeautifulSoup and/or Selenium methods
scraper.goto('https://github.com/LucasFaudman')
# Use BeautifulSoup to search for and extract content
# by accessing the scraper's 'soup' attribute
# or with the 'soup_find' / 'soup_find_all' methods
repos = scraper.soup.find_all('span', class_='repo')
for repo in repos:
repo_name = repo.text
print(repo_name)
# Use Selenium to interact with the page such as clicking buttons
# or filling out forms by accessing the scraper's
# find_element_by_* / find_elements_by_* / wait_for_* methods
repos_tab = scraper.find_element_by_css_selector("a[data-tab-item='repositories']")
repos_tab.click()
search_input = scraper.wait_for_visibility_of_element_located_by_id('your-repos-filter')
search_input.send_keys('souper-scraper')
search_input.submit()
BeautifulSoup Reference
- Quick Start
- Types of Objects
- The BeautifulSoup object
- Navigating the HTML tree
- Searching for HTML Elements
- Modifying the tree
Selenium Reference
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
souperscraper-1.0.2.tar.gz
(16.8 kB
view details)
Built Distribution
File details
Details for the file souperscraper-1.0.2.tar.gz
.
File metadata
- Download URL: souperscraper-1.0.2.tar.gz
- Upload date:
- Size: 16.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.11.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 317d1090f1cd3ca3bbe05405c4679149d978fbd7347461c70702d236da6ee2ca |
|
MD5 | 88f40315cab4ac33fcd2c5380f42e6ca |
|
BLAKE2b-256 | ad3223da4a7bb571b766dfecd8ba00ef9f86f6646698d6dfd148ffa229449f12 |
File details
Details for the file souperscraper-1.0.2-py3-none-any.whl
.
File metadata
- Download URL: souperscraper-1.0.2-py3-none-any.whl
- Upload date:
- Size: 11.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.11.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 605572112f213de06804f32b23c82036741d831a3322844d972b2e6c8a12a03f |
|
MD5 | 31477a846f140f5dfa2e5ec70f3c735f |
|
BLAKE2b-256 | ea40edc8aeabfd6573d2e16cf489b77bcf0491981a78604c7b0330b7aa309e59 |