Web crawling tool
Project description
pywebcrwl
pywebcrwl is a simple Python web crawler that extracts various types of information such as links, emails, phone numbers, keywords, and more from websites.
Features
- Crawl and extract all pages from a given URL
- Extract email addresses (with optional domain filtering)
- Extract phone numbers (including international formats)
- Detect cities mentioned in the text
- Find matches for a given regular expression
- Extract all image URLs
- Extract all websites/domains mentioned on a page
- Extract downloadable documents (optionally by file extension)
- Extract raw HTML code of pages
- Identify keywords from the content
- Extract all sentences containing a specific word
- Extract website favicons
- Extract social media links
- Generate a summary (resume) of a page
Installation
pip install pywebcrwl
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pywebcrwl-0.1.0.tar.gz
(5.7 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pywebcrwl-0.1.0.tar.gz.
File metadata
- Download URL: pywebcrwl-0.1.0.tar.gz
- Upload date:
- Size: 5.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
864ef879cecbce5c0627bfb091c92da0005410cc1764fd3b68ef15962bb1be55
|
|
| MD5 |
ad7472c140173681cef7362c96a19aa3
|
|
| BLAKE2b-256 |
090dbfd468b4ec7cbe85a9725db3309529c4b7e32ee0602f7ce8fa961e05dd09
|
File details
Details for the file pywebcrwl-0.1.0-py3-none-any.whl.
File metadata
- Download URL: pywebcrwl-0.1.0-py3-none-any.whl
- Upload date:
- Size: 6.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
585503a6aafc2186ac33320202f3abd4abf2dd3608fc371c55c30d57a7c4f412
|
|
| MD5 |
08a0d1189ebb66142390c759e722f677
|
|
| BLAKE2b-256 |
ec52f47ba9bb0c5101a1702364c3cebdfd21664705aaac89cb4c7432bf302460
|