A simple web scraper base combining Beautiful Soup and Selenium
Project description
SouperScraper
A simple web scraper base that combines BeautifulSoup and Selenium to scrape dynamic websites.
Setup
- Install with pip
pip install souperscraper
- Download the appropriate ChromeDriver for your Chrome version using getchromedriver.py (command below) or manually from the ChromeDriver website.
To find your Chrome version, go to
chrome://settings/help
in your browser.
getchromedriver
- Create a new SouperScaper object using the path to your ChromeDriver
from souperscraper import SouperScraper
scraper = SouperScraper('/path/to/your/chromedriver')
- Start scraping using BeautifulSoup and/or Selenium methods
scraper.goto('https://github.com/LucasFaudman')
# Use BeautifulSoup to search for and extract content
# by accessing the scraper's 'soup' attribute
# or with the 'soup_find' / 'soup_find_all' methods
repos = scraper.soup.find_all('span', class_='repo')
for repo in repos:
repo_name = repo.text
print(repo_name)
# Use Selenium to interact with the page such as clicking buttons
# or filling out forms by accessing the scraper's
# find_element_by_* / find_elements_by_* / wait_for_* methods
repos_tab = scraper.find_element_by_css_selector("a[data-tab-item='repositories']")
repos_tab.click()
search_input = scraper.wait_for_visibility_of_element_located_by_id('your-repos-filter')
search_input.send_keys('souper-scraper')
search_input.submit()
BeautifulSoup Reference
- Quick Start
- Types of Objects
- The BeautifulSoup object
- Navigating the HTML tree
- Searching for HTML Elements
- Modifying the tree
Selenium Reference
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
souperscraper-1.0.1.tar.gz
(16.8 kB
view details)
Built Distribution
File details
Details for the file souperscraper-1.0.1.tar.gz
.
File metadata
- Download URL: souperscraper-1.0.1.tar.gz
- Upload date:
- Size: 16.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.11.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c8411375020fc87041946dc89e823188d9795e5ea87fc470bc10e7026a085a9b |
|
MD5 | 5a9f250b059a708f285fa0a5a5440bb1 |
|
BLAKE2b-256 | c9deacfd39573076b58ce43fe2681a0ac16c8f1f83812b66cebd4607195b0d17 |
File details
Details for the file souperscraper-1.0.1-py3-none-any.whl
.
File metadata
- Download URL: souperscraper-1.0.1-py3-none-any.whl
- Upload date:
- Size: 11.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.11.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6b06c0736b703555244fc3d142a3f3ecf62de2e95416e3c1917b973743b09c2c |
|
MD5 | 97001d440a32ba756d8bb1e9b019df0f |
|
BLAKE2b-256 | 59153a54351a08107955aa0d09b8acad4777431413e0f082b96f3491d7b3edb0 |