Skip to main content

A simple web scraper base combining Beautiful Soup and Selenium

Project description

SouperScraper

A simple web scraper base that combines BeautifulSoup and Selenium to scrape dynamic websites.

Setup

  1. Install with pip
pip install souperscraper
  1. Download the appropriate ChromeDriver for your Chrome version using getchromedriver.py (command below) or manually from the ChromeDriver website.

To find your Chrome version, go to chrome://settings/help in your browser.

getchromedriver
  1. Create a new SouperScaper object using the path to your ChromeDriver
from souperscraper import SouperScraper

scraper = SouperScraper('/path/to/your/chromedriver')
  1. Start scraping using BeautifulSoup and/or Selenium methods
scraper.goto('https://github.com/LucasFaudman')

# Use BeautifulSoup to search for and extract content
# by accessing the scraper's 'soup' attribute
# or with the 'soup_find' / 'soup_find_all' methods
repos = scraper.soup.find_all('span', class_='repo')
for repo in repos:
    repo_name = repo.text
    print(repo_name)

# Use Selenium to interact with the page such as clicking buttons
# or filling out forms by accessing the scraper's
# find_element_by_* / find_elements_by_* / wait_for_* methods
repos_tab = scraper.find_element_by_css_selector("a[data-tab-item='repositories']")
repos_tab.click()

search_input = scraper.wait_for_visibility_of_element_located_by_id('your-repos-filter')
search_input.send_keys('souper-scraper')
search_input.submit()

BeautifulSoup Reference

Selenium Reference

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

souperscraper-1.0.1.tar.gz (16.8 kB view details)

Uploaded Source

Built Distribution

souperscraper-1.0.1-py3-none-any.whl (11.6 kB view details)

Uploaded Python 3

File details

Details for the file souperscraper-1.0.1.tar.gz.

File metadata

  • Download URL: souperscraper-1.0.1.tar.gz
  • Upload date:
  • Size: 16.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.11.9

File hashes

Hashes for souperscraper-1.0.1.tar.gz
Algorithm Hash digest
SHA256 c8411375020fc87041946dc89e823188d9795e5ea87fc470bc10e7026a085a9b
MD5 5a9f250b059a708f285fa0a5a5440bb1
BLAKE2b-256 c9deacfd39573076b58ce43fe2681a0ac16c8f1f83812b66cebd4607195b0d17

See more details on using hashes here.

File details

Details for the file souperscraper-1.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for souperscraper-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 6b06c0736b703555244fc3d142a3f3ecf62de2e95416e3c1917b973743b09c2c
MD5 97001d440a32ba756d8bb1e9b019df0f
BLAKE2b-256 59153a54351a08107955aa0d09b8acad4777431413e0f082b96f3491d7b3edb0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page