A collection of tools to aid in web scraping.
Project description
Scrapetools
A collection of tools to aid in web scraping.
Install using:
pip install scrapetools
Scrapetools contains three functions (scrape_emails, scrape_phone_numbers, scrape_inputs) and one class (LinkScraper).
Basic usage
import scrapetools
import requests
url = 'https://somewebsite.com'
source = requests.get(url).text
emails = scrapetools.scrape_emails(source)
phoneNumbers = scrapetools.scrape_phone_numbers(source)
scraper = scrapetools.LinkScraper(source, url)
scraper.scrape_page()
# links can be accessed and filtered via the get_links() function
same_site_links = scraper.get_links(same_site_only=True)
same_site_image_links = scraper.get_links(link_type='img', same_site_only=True)
external_image_links = scraper.get_links(link_type='img', excluded_links=same_site_image_links)
# scrape_inputs() returns a tuple of BeautifulSoup Tag elements for various user input elements
forms, inputs, buttons, selects, text_areas = scrapetools.scrape_inputs(source)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
scrapetools-1.1.9.tar.gz
(7.8 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file scrapetools-1.1.9.tar.gz.
File metadata
- Download URL: scrapetools-1.1.9.tar.gz
- Upload date:
- Size: 7.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.11.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
11d9a694466f7054a2f87916f55c562348b9753ed45a4129ab10447fe5453dcc
|
|
| MD5 |
9285fa4a2f8ce64f173a1d712552852f
|
|
| BLAKE2b-256 |
b8c1d69cce44217659f00270a67df065c013ddf4601b4976685a395b89e389e7
|
File details
Details for the file scrapetools-1.1.9-py3-none-any.whl.
File metadata
- Download URL: scrapetools-1.1.9-py3-none-any.whl
- Upload date:
- Size: 9.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.11.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
03a4ff71507be0f5b402299120913777f6914c33347008367cf5c441b0ba05fa
|
|
| MD5 |
3da01baff04de7189ea77b8086dc9a1b
|
|
| BLAKE2b-256 |
551a3ce12edacb8df6ec1ca1d0f6921944dff2832fdf9a4870560e42b4f05d3d
|