Skip to main content

A simple web scraper base combining Beautiful Soup and Selenium

Project description

SouperScraper

A simple web scraper base that combines BeautifulSoup and Selenium to scrape dynamic websites.

Setup

  1. Install with pip
pip install souperscraper
  1. Download the appropriate ChromeDriver for your Chrome version using getchromedriver.py (command below) or manually from the ChromeDriver website.

To find your Chrome version, go to chrome://settings/help in your browser.

getchromedriver
  1. Create a new SouperScaper object using the path to your ChromeDriver
from souperscraper import SouperScraper

scraper = SouperScraper('/path/to/your/chromedriver')
  1. Start scraping using BeautifulSoup and/or Selenium methods
scraper.goto('https://github.com/LucasFaudman')

# Use BeautifulSoup to search for and extract content
# by accessing the scraper's 'soup' attribute
# or with the 'soup_find' / 'soup_find_all' methods
repos = scraper.soup.find_all('span', class_='repo')
for repo in repos:
    repo_name = repo.text
    print(repo_name)

# Use Selenium to interact with the page such as clicking buttons
# or filling out forms by accessing the scraper's
# find_element_by_* / find_elements_by_* / wait_for_* methods
repos_tab = scraper.find_element_by_css_selector("a[data-tab-item='repositories']")
repos_tab.click()

search_input = scraper.wait_for_visibility_of_element_located_by_id('your-repos-filter')
search_input.send_keys('souper-scraper')
search_input.submit()

BeautifulSoup Reference

Selenium Reference

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

souperscraper-1.0.2.tar.gz (16.8 kB view details)

Uploaded Source

Built Distribution

souperscraper-1.0.2-py3-none-any.whl (11.6 kB view details)

Uploaded Python 3

File details

Details for the file souperscraper-1.0.2.tar.gz.

File metadata

  • Download URL: souperscraper-1.0.2.tar.gz
  • Upload date:
  • Size: 16.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.11.9

File hashes

Hashes for souperscraper-1.0.2.tar.gz
Algorithm Hash digest
SHA256 317d1090f1cd3ca3bbe05405c4679149d978fbd7347461c70702d236da6ee2ca
MD5 88f40315cab4ac33fcd2c5380f42e6ca
BLAKE2b-256 ad3223da4a7bb571b766dfecd8ba00ef9f86f6646698d6dfd148ffa229449f12

See more details on using hashes here.

File details

Details for the file souperscraper-1.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for souperscraper-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 605572112f213de06804f32b23c82036741d831a3322844d972b2e6c8a12a03f
MD5 31477a846f140f5dfa2e5ec70f3c735f
BLAKE2b-256 ea40edc8aeabfd6573d2e16cf489b77bcf0491981a78604c7b0330b7aa309e59

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page