Web tools and interfaces for Internet data processing.
Project description
webtoolkit
Provides classes and tools for Internet data processing.
- Url parsing
- HTTP status codes identification
- Page definitions: HtmlPage, RssPage, OpmlPage, Content interfaces
- Means of calling crawling systems, Crawling interfaces
Remote crawling interfaces are implmented by crawler-buddy.
Available on pypi.
Url parsing
Clean link from trackers, sanitize
UrlLocation.get_cleaned_link
To obtain domain
UrlLocation(link).get_domain()
HTTP processing
Identification of valid codes
PageResponseObject().is_valid
Identification of invalid codes
PageResponseObject().is_invalid
Some codes might not indicate that this page is valid, and is not invalid. For example if our crawler is throttled because of too many requests we do not know yet if the page is valid, or not.
Page definitions
Easy access to HTML properties
page = HtmlPage(url, contents)
page.get_title()
page.get_description()
Easy access to RSS properties
page = RssPage(url, contents)
page.get_title()
page.get_description()
page.get_entries()
Easy access to Opml properties
page = OpmlPage(url, contents)
page.get_entries()
Interfaces
- RemoteServer - provides means of calling remote crawling systems
- Url - wrapper for RemoteServer, to obtain ready to use data
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file webtoolkit-0.0.5.tar.gz.
File metadata
- Download URL: webtoolkit-0.0.5.tar.gz
- Upload date:
- Size: 40.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.2 CPython/3.11.2 Linux/6.12.20+rpt-rpi-v8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5117188c772cdc5a3248033bfa8272dc6cc9ebc4b9064be7e236d1b6beef835c
|
|
| MD5 |
a12fa33f15e8f37aade5df07ad3d6acc
|
|
| BLAKE2b-256 |
b9043773b5d79cc0a23ec1eeb703a2132f8c95e388f670b60c7194056c0d38da
|
File details
Details for the file webtoolkit-0.0.5-py3-none-any.whl.
File metadata
- Download URL: webtoolkit-0.0.5-py3-none-any.whl
- Upload date:
- Size: 44.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.2 CPython/3.11.2 Linux/6.12.20+rpt-rpi-v8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1d1a20e9abedf84d35d49a67901c31f865bd8bd29c45cb7427e0ec349c089550
|
|
| MD5 |
95140dc73ee1ff4f45b89ff56218a5df
|
|
| BLAKE2b-256 |
29115852bf954526213c7cc8ad575bd1977d11000ba8d029b5f9554421407db6
|