Skip to main content

Web Spider to retrieve links

Project description

The program "arsespyder" is a Web Crawler that enables crawling a URL and inspect the links contained in that web recursively, so that for a provided URL, the crawler downloads the links of type <a href="http://whatever">whatever</a> contained on it.
For each of them the operations is repeated, until the level of search provided by parameter is achieved. If not provided, the crawling depth levels is, by default, 3.

Usage of the application is as follows:

$ ./ --help
usage: [-h] [-v] [-n NUMBER_OF_LEVELS] url

Internet Crawler

positional arguments:
url URL to crawl

optional arguments:
-h, --help show this help message and exit
-v, --version show program's version number and exit
-n NUMBER_OF_LEVELS, --number-of-levels NUMBER_OF_LEVELS
Crawling depth

NOTE 1 - The only mandatory parameter is the URL to crawl.
NOTE 2 - If number of levels is not provided, by default, the NUMBER_OF_LEVELS is 3.

arsespyder Web crawler will dump those links up to the depth crawling level specified, in the next format:

$ ./ -n3

* http://... are level 1 links (existing in HTML code URL specified parameter)
** http://... are level 2 links (existing in HTML code of level 1 links)
*** http://... are level 3 links (existing in HTML code of level 2 links)

Code documentation is contained under "doc" folder. Main .html file is pyarsespyder.html:

$ tree doc/
├── pyarsespyder.geturl.html
├── pyarsespyder.html
└── pyarsespyder.validateurl.html

Check INSTALL file

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for arsespyder, version 0.0.3
Filename, size File type Python version Upload date Hashes
Filename, size arsespyder-0.0.3.linux-i686.tar.gz (4.5 kB) File type Dumb Binary Python version any Upload date Hashes View hashes
Filename, size arsespyder-0.0.3.tar.gz (3.9 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page