A collection of tools to aid in web scraping.
Project description
scrapetools
A collection of tools to aid in web scraping.
Install using:
pip install scrapeTools
scrapeTools contains four modules: emailScraper, linkScraper, phoneScraper, and inputScraper.
Only linkScraper contains a class.
Basic usage:
from scrapeTools.emailScraper import scrapeEmails from scrapeTools.phoneScraper import scrapePhoneNumbers from scrapeTools.linkScraper import LinkScraper from scrapeTools.inputScraper import scrapeInputs import requests url = 'https://somewebsite.com' source = requests.get(url).text emails = scrapeEmails(source) phoneNumbers = scrapePhoneNumbers(source) linkScraper = LinkScraper(source, url) linkScraper.scrapePage() # links can be accessed and filtered via the getLinks() function sameSiteLinks = linkScraper.getLinks(sameSiteOnly=True) sameSiteImageLinks =linkScraper.getLinks(linkType='img', sameSiteOnly=True) externalImageLinks = linkScraper.getLinks(linkType='img', excludedLinks=sameSiteImageLinks) # scrapeInputs() returns a tuple of BeautifulSoup Tag elements for various user input elements forms, inputs, buttons, selects, textAreas = scrapeInputs(source)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
scrapetools-0.2.1.tar.gz
(21.1 kB
view hashes)
Built Distribution
Close
Hashes for scrapetools-0.2.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0d2906b18390b0f32e6a721d12847eb7be6bbf111d99509cbd642c0a61cae904 |
|
MD5 | f8b4bb7167ebd3439b25a33eaecdf969 |
|
BLAKE2b-256 | 29997cfad4973e871e77527c9f50a045b8de6c6b3c83f1dfce10ee76ae18fa75 |