Skip to main content

A very simple web crawler and checker

Project description

ChkWeb

This is a very simple web crawler to check the public webpages in a webserver.

to use, call the subcommand start with the URL to crawl:

chkweb start http://localhost/

This will create a sqlite3 database pages.db whith the urls being detected by the spyder. It also checks this first page and add all the locals links to the database as pending url to be checked. Now you can run:

chkweb advance

to continue the crawling process. This is going to take at most 10 pending url and repeat the process with each of then. You can define the maximun amount of new urls to be checked setting the environment variable CHKWEB_ADVANCE_LIMIT or setting the --limit command line option, like in this example:

chkweb advance --limit 1000

Checking process status

You can check the current process status with the subcommand status, like this:

chkweb status

Logs

A log file is stored in logs/chklog.log. You can change the log level either in the settings file or declaring a environment variable named CHKWEB_LOG_LEVEL to the desired level. It is set to ERROR by default.

TODO things

  • Add a plugin system to perform custom checks

  • Add a new subcommand to make the tests from a given list of urls

  • Add an option to select the name and path of the database file. Alos include in the settings.py file.

DONE things

  • add an option in the advance command to set the number of pages being analized in every call. Set to 0 to indicate continue until all the pages are analized [DONE 0.1.4]

  • logs stored in some other location [DONE 0.1.2]

  • Subcommand list to list the URLs in the database [DONE 0.1.2]

  • Subcommand init to delete the database and start a new crawl proces [DONE 0.1.2]

  • subcommand run to get a URL form the pending list and check it [DONE 0.1.2]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chkweb-0.1.5.tar.gz (15.5 kB view details)

Uploaded Source

Built Distribution

chkweb-0.1.5-py3-none-any.whl (15.7 kB view details)

Uploaded Python 3

File details

Details for the file chkweb-0.1.5.tar.gz.

File metadata

  • Download URL: chkweb-0.1.5.tar.gz
  • Upload date:
  • Size: 15.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.5 CPython/3.8.5 Linux/5.4.0-66-generic

File hashes

Hashes for chkweb-0.1.5.tar.gz
Algorithm Hash digest
SHA256 bf4d3d60614532934502c1889ca8ce34d59e404abce89146e4c3c8f620cef9fe
MD5 d9c0f02dfcd7689f487909a3dbedf8d1
BLAKE2b-256 f3c2c4746e5bd8e7501a94b89c6166523d7d4bc973eba2be63c042d7ed28155c

See more details on using hashes here.

File details

Details for the file chkweb-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: chkweb-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 15.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.5 CPython/3.8.5 Linux/5.4.0-66-generic

File hashes

Hashes for chkweb-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 0f3b14e800201d6b02c772f27d5699d1a28ebd69838bdbc69f94f32b90ff81ec
MD5 d13993935ee9b4dddd3e9f30d418c97b
BLAKE2b-256 e57c5f952a170fbd766c7922c10383aa173ddb34dfd4ac3c4415fe1bc748ddbb

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page