Skip to main content

A simple web scraper base combining Beautiful Soup and Selenium

Project description

SouperScraper

A simple web scraper base that combines BeautifulSoup and Selenium to scrape dynamic websites.

Setup

  1. Install with pip
pip install souperscraper
  1. Download the appropriate ChromeDriver for your Chrome version using get_chrome_driver.py or manually from the ChromeDriver website. To find your Chrome version, go to chrome://settings/help in your browser.
getchromedriver
  1. Create a new SouperScaper object using the path to your ChromeDriver
from souper_scraper import SouperScraper

scraper = SouperScraper(executable_path='/path/to/your/chromedriver')
  1. Start scraping using BeautifulSoup and/or Selenium methods
scraper.goto('https://github.com/LucasFaudman')

# Use BeautifulSoup to search for and extract content
# by accessing the scraper's 'soup' attribute
# or with the 'soup_find' / 'soup_find_all' methods
repos = scraper.soup.find_all('span', class_='repo')
for repo in repos:
    repo_name = repo.text
    print(repo_name)

# Use Selenium to interact with the page such as clicking buttons
# or filling out forms by accessing the scraper's
# find_element_by_* / find_elements_by_* / wait_for_* methods
repos_tab = scraper.find_element_by_css_selector("a[data-tab-item='repositories']")
repos_tab.click()

search_input = scraper.wait_for_visibility_of_element_located_by_id('your-repos-filter')
search_input.send_keys('souper-scraper')
search_input.submit()

BeautifulSoup Reference

Selenium Reference

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

souperscraper-1.0.0.tar.gz (16.5 kB view details)

Uploaded Source

Built Distribution

souperscraper-1.0.0-py3-none-any.whl (11.3 kB view details)

Uploaded Python 3

File details

Details for the file souperscraper-1.0.0.tar.gz.

File metadata

  • Download URL: souperscraper-1.0.0.tar.gz
  • Upload date:
  • Size: 16.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.11.9

File hashes

Hashes for souperscraper-1.0.0.tar.gz
Algorithm Hash digest
SHA256 c12b6c959385d892f20f52a9aa6c9dd3238b690cf6f983794f04a516f86a2949
MD5 df55a652581058383e54cbe390dc3f4f
BLAKE2b-256 a5c1858a32659877f27d480e0c85d0ab6333894ccbbc3a07e1af021bbed5822f

See more details on using hashes here.

File details

Details for the file souperscraper-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for souperscraper-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3963d02a107256793f0d98bbb29f12aa7e42fa039e5f20d54d45d756025781d4
MD5 c6e1eac4d1ec66782e6b2e23678b3330
BLAKE2b-256 6ceaaebba39076f907b16cb0bbc525857d008b5831706071c1600a423a5c41d5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page