A collection of tools to aid in web scraping.
Project description
scrapetools
A collection of tools to aid in web scraping.
Install using:
pip install scrapeTools
scrapeTools contains four modules: emailScraper, linkScraper, phoneScraper, and inputScraper.
Only linkScraper contains a class.
Basic usage:
from scrapeTools.emailScraper import scrapeEmails from scrapeTools.phoneScraper import scrapePhoneNumbers from scrapeTools.linkScraper import LinkScraper from scrapeTools.inputScraper import scrapeInputs import requests url = 'https://somewebsite.com' source = requests.get(url).text emails = scrapeEmails(source) phoneNumbers = scrapePhoneNumbers(source) linkScraper = LinkScraper(source, url) linkScraper.scrapePage() # links can be accessed and filtered via the getLinks() function sameSiteLinks = linkScraper.getLinks(sameSiteOnly=True) sameSiteImageLinks =linkScraper.getLinks(linkType='img', sameSiteOnly=True) externalImageLinks = linkScraper.getLinks(linkType='img', excludedLinks=sameSiteImageLinks) # scrapeInputs() returns a tuple of BeautifulSoup Tag elements for various user input elements forms, inputs, buttons, selects, textAreas = scrapeInputs(source)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
scrapetools-0.2.0.tar.gz
(21.1 kB
view hashes)
Built Distribution
Close
Hashes for scrapetools-0.2.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5ea1a1862ec3568bef3bdb33c1f2d5170949d1c786265f0425df009371f5f627 |
|
MD5 | 4464b6d93679161610fc44c996ef8c81 |
|
BLAKE2b-256 | fd2da3b41a2f14100bd0a3c91bde48f2e3e457fa2e0124104664f859cec5287e |