Webdiver for get URI content like text, html, download or DOM TAGs Filter with Scheduling
Project description
⛵ Web Diver - WebScrawling
Version: 8.0 Alpha
Status: Under Development
Author: #asytrick
Website: github.com/ssmool/webdiver
Contact: eusmool@gmail.com
⛵ Web Diver
Web Diver is a simple and powerful web crawling and search automation tool for the World Wide Web, developed in Python. Web Diver's main purpose is to read and archive content from specific URLs, saving the data in a structured format in a local SQLite database.
🚀 Installation
First of all, make sure you're using Python 3.6+ and have pip updated.
Install via pip:
pip install webdiver
⚙️ Basic Functionality
www_diver_add_task(url: str)
Sets the URL that will be used in the web content reading and search task.
Example:
www_diver_add_task('https://electronics.howstuffworks.com/tv.htm')
www_diver_start(db: str, type: str)
Reads the URL defined with www_diver_add_task('db.sqlite','_text') and stores the extracted information in the SQLite database.
Example:
_type = ['_text','_html','_filter']
www_diver_start('db.sqlite',_type[0])
Webdiver with lists:
from webdiver.web_diver import *
_v0x = ['http://www.github.com/ssmool/webdiver#_html','http://www.github.com/ssmool/radgram#_text','http://www.github.com/ssmool/cinewiz/raw/main/assets/cinewiz_cover.gif#_download']
for _c0x in _v0x:
_c0x_x01 = _c0x.split('#')
_c0x_x0 = _c0x_x01[0]
_c0x_x1 = _c0x_x01[1]
www_diver_add_task(_c0x_x0)
www_diver_start('db_plugwarez_0x1.sqlite',_c0x_x1)
set_task(uri: str, hour: int, minute: int)
Schedules a new web crawling task for a specific time, setting the URL, hour, and minute for automatic execution.
Example:
set_task("https://example.com/news", '14', '30', '_text')
💡 Usage Examples
import web_diver as _WEBDIVER
www_diver_add_task('https://electronics.howstuffworks.com/tv.htm')
www_diver_start('db.sqlite','_text')
🗃️ Database The captured data is automatically stored in an SQLite database with information such as:
URL accessed
Simplified HTML content
Collection timestamp
👨💻 Developed by #asytrick Project available at: github.com/ssmool/webdiver
🤝 Contributions
Contributions are welcome! Feel free to open issues, submit pull requests, or reach out by email.
📫 Contact
- Author: #asytrick
- Repository: github.com/webdiver
- Email: eusmool@gmail.com
📦 CineOS Barsotti @buskplay - RAG PARTS:
Webdiver is a part of the CineOS Barsotti @buskplay - Unix Like project and aligned with global goals for decentralized AI-assisted creative ORM Development, Generative Software Assembly, Researchs, Works Suits and so much more pourposes for SW Deploy and support for(AI Orquestrators by LLMs and GEN-AI and OS Support Documentation) by generative creativity.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file webdiver-8.0.tar.gz.
File metadata
- Download URL: webdiver-8.0.tar.gz
- Upload date:
- Size: 7.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
055d4378d44d36e11c3daf76991c8fcb87286c822682abb3662dcb6026fa5b5c
|
|
| MD5 |
0526ac43bfa8e0f2c6a7fb84707d8633
|
|
| BLAKE2b-256 |
ece71a88eed61943be8249c6b369e6fd173471c82b33d6ff5967dcadc36d8ba0
|
File details
Details for the file webdiver-8.0-py3-none-any.whl.
File metadata
- Download URL: webdiver-8.0-py3-none-any.whl
- Upload date:
- Size: 9.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bba9046f6e47e4cdaf901fb43def2ec60ac3ddf0c67f7f42b0c17bb1f8f5deb7
|
|
| MD5 |
87b92100f5218b6a88334c8a9e7bd030
|
|
| BLAKE2b-256 |
17446a12f17b214c5d4aaaaf618bb719a7fee9f6d7d4ae9fdf57a8168aa10e18
|