Skip to main content

Webdiver for get URI content like text, html, download or DOM TAGs Filter with Scheduling

Project description

⛵ Web Diver - WebScrawling

Version: 8.0 Alpha

Status: Under Development

Author: #asytrick

Website: github.com/ssmool/webdiver

Contact: eusmool@gmail.com

⛵ Web Diver

Web Diver is a simple and powerful web crawling and search automation tool for the World Wide Web, developed in Python. Web Diver's main purpose is to read and archive content from specific URLs, saving the data in a structured format in a local SQLite database.


🚀 Installation

First of all, make sure you're using Python 3.6+ and have pip updated.

Install via pip:

pip install webdiver

⚙️ Basic Functionality

www_diver_add_task(url: str)

Sets the URL that will be used in the web content reading and search task.

Example:

www_diver_add_task('https://electronics.howstuffworks.com/tv.htm')

www_diver_start(db: str, type: str)

Reads the URL defined with www_diver_add_task('db.sqlite','_text') and stores the extracted information in the SQLite database.

Example:

_type = ['_text','_html','_filter']
www_diver_start('db.sqlite',_type[0])

Webdiver with lists:

from webdiver.web_diver import *

_v0x = ['http://www.github.com/ssmool/webdiver#_html','http://www.github.com/ssmool/radgram#_text','http://www.github.com/ssmool/cinewiz/raw/main/assets/cinewiz_cover.gif#_download']

for _c0x in _v0x:
    _c0x_x01 = _c0x.split('#')
    _c0x_x0 = _c0x_x01[0]
    _c0x_x1 = _c0x_x01[1]
    www_diver_add_task(_c0x_x0)
    www_diver_start('db_plugwarez_0x1.sqlite',_c0x_x1)

set_task(uri: str, hour: int, minute: int)

Schedules a new web crawling task for a specific time, setting the URL, hour, and minute for automatic execution.

Example:

set_task("https://example.com/news", '14', '30', '_text')

💡 Usage Examples

import web_diver as _WEBDIVER
www_diver_add_task('https://electronics.howstuffworks.com/tv.htm')
www_diver_start('db.sqlite','_text')

🗃️ Database The captured data is automatically stored in an SQLite database with information such as:

URL accessed

Simplified HTML content

Collection timestamp

👨‍💻 Developed by #asytrick Project available at: github.com/ssmool/webdiver

🤝 Contributions

Contributions are welcome! Feel free to open issues, submit pull requests, or reach out by email.

📫 Contact

📦 CineOS Barsotti @buskplay - RAG PARTS:

Webdiver is a part of the CineOS Barsotti @buskplay - Unix Like project and aligned with global goals for decentralized AI-assisted creative ORM Development, Generative Software Assembly, Researchs, Works Suits and so much more pourposes for SW Deploy and support for(AI Orquestrators by LLMs and GEN-AI and OS Support Documentation) by generative creativity.

Project details


Release history Release notifications | RSS feed

This version

8.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

webdiver-8.0.tar.gz (7.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

webdiver-8.0-py3-none-any.whl (9.7 kB view details)

Uploaded Python 3

File details

Details for the file webdiver-8.0.tar.gz.

File metadata

  • Download URL: webdiver-8.0.tar.gz
  • Upload date:
  • Size: 7.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for webdiver-8.0.tar.gz
Algorithm Hash digest
SHA256 055d4378d44d36e11c3daf76991c8fcb87286c822682abb3662dcb6026fa5b5c
MD5 0526ac43bfa8e0f2c6a7fb84707d8633
BLAKE2b-256 ece71a88eed61943be8249c6b369e6fd173471c82b33d6ff5967dcadc36d8ba0

See more details on using hashes here.

File details

Details for the file webdiver-8.0-py3-none-any.whl.

File metadata

  • Download URL: webdiver-8.0-py3-none-any.whl
  • Upload date:
  • Size: 9.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for webdiver-8.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bba9046f6e47e4cdaf901fb43def2ec60ac3ddf0c67f7f42b0c17bb1f8f5deb7
MD5 87b92100f5218b6a88334c8a9e7bd030
BLAKE2b-256 17446a12f17b214c5d4aaaaf618bb719a7fee9f6d7d4ae9fdf57a8168aa10e18

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page