Web crawling tool
Project description
pywebcrwl
pywebcrwl is a simple Python web crawler that extracts various types of information such as links, emails, phone numbers, keywords, and more from websites.
Features
- Crawl and extract all pages from a given URL
- Extract email addresses (with optional domain filtering)
- Extract phone numbers (including international formats)
- Detect cities mentioned in the text
- Find matches for a given regular expression
- Extract all image URLs
- Extract all websites/domains mentioned on a page
- Extract downloadable documents (optionally by file extension)
- Extract raw HTML code of pages
- Identify keywords from the content
- Extract all sentences containing a specific word
- Extract website favicons
- Extract social media links
- Generate a summary (resume) of a page
Installation
pip install pywebcrwl
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pywebcrwl-0.1.2.tar.gz
(5.5 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pywebcrwl-0.1.2.tar.gz.
File metadata
- Download URL: pywebcrwl-0.1.2.tar.gz
- Upload date:
- Size: 5.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b0c09b3c57b976400d6efbd3817dbafd8d81a37366b773b7fc3f3276ff0a2615
|
|
| MD5 |
261c6b039c66f9d399fa25f1d1d41e65
|
|
| BLAKE2b-256 |
504b8b66f0bfff18ad38807a0b1cc1f944d3c12d481eea80971e24691d1d305e
|
File details
Details for the file pywebcrwl-0.1.2-py3-none-any.whl.
File metadata
- Download URL: pywebcrwl-0.1.2-py3-none-any.whl
- Upload date:
- Size: 6.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
be97da40ca2dad95d3259d24d2ffad6690d339eb2a4cf755083b167394685513
|
|
| MD5 |
dc2c9d2182586b789a8d01b9bf6fd107
|
|
| BLAKE2b-256 |
b3b42fdf83ec7e68a5d86fe08efb12b0dd7f8736df1d977aca8c57421ba92bb2
|