A collection of tools to aid in web scraping.
Project description
Scrapetools
A collection of tools to aid in web scraping.
Install using:
pip install scrapetools
Scrapetools contains three functions (scrape_emails, scrape_phone_numbers, scrape_inputs)
and one class (LinkScraper).
Basic usage
import scrapetools import requests url = 'https://somewebsite.com' source = requests.get(url).text emails = scrapetools.scrape_emails(source) phoneNumbers = scrapetools.scrape_phone_numbers(source) scraper = scrapetools.LinkScraper(source, url) scraper.scrape_page() # links can be accessed and filtered via the get_links() function same_site_links = scraper.get_links(same_site_only=True) same_site_image_links = scraper.get_links(link_type='img', same_site_only=True) external_image_links = scraper.get_links(link_type='img', excluded_links=same_site_image_links) # scrape_inputs() returns a tuple of BeautifulSoup Tag elements for various user input elements forms, inputs, buttons, selects, text_areas = scrapetools.scrape_inputs(source)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
scrapetools-1.1.1.tar.gz
(105.3 kB
view hashes)
Built Distribution
Close
Hashes for scrapetools-1.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 41779c3220fbb08ea2ec81c2d7efb7578d38abd8218f7f337f9271de67fcede7 |
|
MD5 | 6e4d48e4eaed4c9b6bf3c881ecce9ce2 |
|
BLAKE2b-256 | 268fe9b35bdcd29d2b982c894821b5caac60af9f4dbc463bac90e719a1df3512 |