A very simple web crawler and checker
Project description
ChkWeb
This is a very simple web crawler to check the public webpages in a webserver.
to use, call the subcommand start with the URL to crawl:
chkweb start http://localhost/
This will create a sqlite3 database pages.db
whith the urls being
detected by the spyder. It also checks this first page and add all the locals
links to the database as pending url to be checked. Now you can run:
chkweb advance
to continue the crawling process. This is going to take at most 10
pending url and repeat the process with each of then. You can define the maximun
amount of new urls to be checked setting the environment variable CHKWEB_ADVANCE_LIMIT
or
setting the --limit
command line option, like in this example:
chkweb advance --limit 1000
Checking process status
You can check the current process status with the subcommand status
, like this:
chkweb status
Logs
A log file is stored in logs/chklog.log
. You can change the
log level either in the settings file or declaring a environment variable
named CHKWEB_LOG_LEVEL
to the desired level. It is set to ERROR
by default.
TODO things
-
Add a plugin system to perform custom checks
-
Add a new subcommand to make the tests from a given list of urls
-
Add an option to select the name and path of the database file. Alos include in the
settings.py
file.
DONE things
-
add an option in the
advance
command to set the number of pages being analized in every call. Set to 0 to indicate continue until all the pages are analized [DONE 0.1.4] -
logs stored in some other location [DONE 0.1.2]
-
Subcommand list to list the URLs in the database [DONE 0.1.2]
-
Subcommand init to delete the database and start a new crawl proces [DONE 0.1.2]
-
subcommand run to get a URL form the pending list and check it [DONE 0.1.2]
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file chkweb-0.1.5.tar.gz
.
File metadata
- Download URL: chkweb-0.1.5.tar.gz
- Upload date:
- Size: 15.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.5 CPython/3.8.5 Linux/5.4.0-66-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bf4d3d60614532934502c1889ca8ce34d59e404abce89146e4c3c8f620cef9fe |
|
MD5 | d9c0f02dfcd7689f487909a3dbedf8d1 |
|
BLAKE2b-256 | f3c2c4746e5bd8e7501a94b89c6166523d7d4bc973eba2be63c042d7ed28155c |
File details
Details for the file chkweb-0.1.5-py3-none-any.whl
.
File metadata
- Download URL: chkweb-0.1.5-py3-none-any.whl
- Upload date:
- Size: 15.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.5 CPython/3.8.5 Linux/5.4.0-66-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0f3b14e800201d6b02c772f27d5699d1a28ebd69838bdbc69f94f32b90ff81ec |
|
MD5 | d13993935ee9b4dddd3e9f30d418c97b |
|
BLAKE2b-256 | e57c5f952a170fbd766c7922c10383aa173ddb34dfd4ac3c4415fe1bc748ddbb |