A simple web scraper base combining Beautiful Soup and Selenium
Project description
SouperScraper
A simple web scraper base that combines BeautifulSoup and Selenium to scrape dynamic websites.
Setup
- Install with pip
pip install souperscraper
- Download the appropriate ChromeDriver for your Chrome version using get_chrome_driver.py or manually from the ChromeDriver website. To find your Chrome version, go to
chrome://settings/help
in your browser.
getchromedriver
- Create a new SouperScaper object using the path to your ChromeDriver
from souper_scraper import SouperScraper
scraper = SouperScraper(executable_path='/path/to/your/chromedriver')
- Start scraping using BeautifulSoup and/or Selenium methods
scraper.goto('https://github.com/LucasFaudman')
# Use BeautifulSoup to search for and extract content
# by accessing the scraper's 'soup' attribute
# or with the 'soup_find' / 'soup_find_all' methods
repos = scraper.soup.find_all('span', class_='repo')
for repo in repos:
repo_name = repo.text
print(repo_name)
# Use Selenium to interact with the page such as clicking buttons
# or filling out forms by accessing the scraper's
# find_element_by_* / find_elements_by_* / wait_for_* methods
repos_tab = scraper.find_element_by_css_selector("a[data-tab-item='repositories']")
repos_tab.click()
search_input = scraper.wait_for_visibility_of_element_located_by_id('your-repos-filter')
search_input.send_keys('souper-scraper')
search_input.submit()
BeautifulSoup Reference
- Quick Start
- Types of Objects
- The BeautifulSoup object
- Navigating the HTML tree
- Searching for HTML Elements
- Modifying the tree
Selenium Reference
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
souperscraper-1.0.0.tar.gz
(16.5 kB
view details)
Built Distribution
File details
Details for the file souperscraper-1.0.0.tar.gz
.
File metadata
- Download URL: souperscraper-1.0.0.tar.gz
- Upload date:
- Size: 16.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.11.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c12b6c959385d892f20f52a9aa6c9dd3238b690cf6f983794f04a516f86a2949 |
|
MD5 | df55a652581058383e54cbe390dc3f4f |
|
BLAKE2b-256 | a5c1858a32659877f27d480e0c85d0ab6333894ccbbc3a07e1af021bbed5822f |
File details
Details for the file souperscraper-1.0.0-py3-none-any.whl
.
File metadata
- Download URL: souperscraper-1.0.0-py3-none-any.whl
- Upload date:
- Size: 11.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.11.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3963d02a107256793f0d98bbb29f12aa7e42fa039e5f20d54d45d756025781d4 |
|
MD5 | c6e1eac4d1ec66782e6b2e23678b3330 |
|
BLAKE2b-256 | 6ceaaebba39076f907b16cb0bbc525857d008b5831706071c1600a423a5c41d5 |