Web tools and interfaces for Internet data processing.
Project description
webtoolkit
Provides classes and tools for Internet data processing.
- Url parsing
- HTTP status codes identification
- Page definitions: HtmlPage, RssPage, OpmlPage, Content interfaces
- Means of calling crawling systems, Crawling interfaces
Remote crawling interfaces are implmented by crawler-buddy.
Available on pypi.
Url parsing
Clean link from trackers, sanitize
UrlLocation.get_cleaned_link()
To obtain domain
UrlLocation(link).get_domain()
HTTP processing
Identification of valid codes
PageResponseObject().is_valid()
Identification of invalid codes
PageResponseObject().is_invalid()
Some codes might not indicate that this page is valid, and is not invalid. For example if our crawler is throttled because of too many requests we do not know yet if the page is valid, or not.
Page definitions
Easy access to HTML properties
page = HtmlPage(url, contents)
page.get_title()
page.get_description()
Easy access to RSS properties
page = RssPage(url, contents)
page.get_title()
page.get_description()
page.get_entries()
Easy access to Opml properties
page = OpmlPage(url, contents)
page.get_entries()
Interfaces
- RemoteServer - provides means of calling remote crawling systems
- RemoteUrl - wrapper for RemoteServer, to obtain ready to use data
- CrawlerInterface - Interface for crawlers
- HandlerInterface - Allows implementing your own handler
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file webtoolkit-0.0.16.tar.gz.
File metadata
- Download URL: webtoolkit-0.0.16.tar.gz
- Upload date:
- Size: 45.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.2 CPython/3.11.2 Linux/6.12.20+rpt-rpi-v8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3d60cfb49e158298f8cd2c91bb30fd38d9ba737bd479208977c18baa66331bff
|
|
| MD5 |
42fce4fa166be45ff4c18f3ecf8c5792
|
|
| BLAKE2b-256 |
e6691cf2ed4a337391a19b2695d311c0b2ea1b549546ada04c100f19b207fd30
|
File details
Details for the file webtoolkit-0.0.16-py3-none-any.whl.
File metadata
- Download URL: webtoolkit-0.0.16-py3-none-any.whl
- Upload date:
- Size: 53.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.2 CPython/3.11.2 Linux/6.12.20+rpt-rpi-v8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5c1a6d7a00f36c2f110bd6a603f3c4976148366048a564b68e04787a84e67f8b
|
|
| MD5 |
0a3df9855123878687015de3570f67dc
|
|
| BLAKE2b-256 |
e5616b3fec5b0216bba8b5aa1e69162f764aed58d9eb465220749620ebac7fd7
|