A collection of tools to aid in web scraping.
Project description
Scrapetools
A collection of tools to aid in web scraping.
Install using:
pip install scrapetools
Scrapetools contains three functions (scrape_emails, scrape_phone_numbers, scrape_inputs) and one class (LinkScraper).
Basic usage
import scrapetools
import requests
url = 'https://somewebsite.com'
source = requests.get(url).text
emails = scrapetools.scrape_emails(source)
phoneNumbers = scrapetools.scrape_phone_numbers(source)
scraper = scrapetools.LinkScraper(source, url)
scraper.scrape_page()
# links can be accessed and filtered via the get_links() function
same_site_links = scraper.get_links(same_site_only=True)
same_site_image_links = scraper.get_links(link_type='img', same_site_only=True)
external_image_links = scraper.get_links(link_type='img', excluded_links=same_site_image_links)
# scrape_inputs() returns a tuple of BeautifulSoup Tag elements for various user input elements
forms, inputs, buttons, selects, text_areas = scrapetools.scrape_inputs(source)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
scrapetools-1.1.9.tar.gz
(7.8 kB
view details)
Built Distribution
File details
Details for the file scrapetools-1.1.9.tar.gz
.
File metadata
- Download URL: scrapetools-1.1.9.tar.gz
- Upload date:
- Size: 7.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.11.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 11d9a694466f7054a2f87916f55c562348b9753ed45a4129ab10447fe5453dcc |
|
MD5 | 9285fa4a2f8ce64f173a1d712552852f |
|
BLAKE2b-256 | b8c1d69cce44217659f00270a67df065c013ddf4601b4976685a395b89e389e7 |
File details
Details for the file scrapetools-1.1.9-py3-none-any.whl
.
File metadata
- Download URL: scrapetools-1.1.9-py3-none-any.whl
- Upload date:
- Size: 9.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.11.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 03a4ff71507be0f5b402299120913777f6914c33347008367cf5c441b0ba05fa |
|
MD5 | 3da01baff04de7189ea77b8086dc9a1b |
|
BLAKE2b-256 | 551a3ce12edacb8df6ec1ca1d0f6921944dff2832fdf9a4870560e42b4f05d3d |