Web Spider to retrieve links
Project description
1 - INTRODUCTION
----------------
The program "arsespyder" is a Web Crawler that enables crawling a URL and inspect the links contained in that web recursively, so that for a provided URL, the crawler downloads the links of type <a href="http://whatever">whatever</a> contained on it.
For each of them the operations is repeated, until the level of search provided by parameter is achieved. If not provided, the crawling depth levels is, by default, 3.
2 - APPLICATION USAGE
---------------------
Usage of the application is as follows:
$ ./arsespyder.py --help
usage: arsespyder.py [-h] [-v] [-n NUMBER_OF_LEVELS] url
Internet Crawler
positional arguments:
url URL to crawl
optional arguments:
-h, --help show this help message and exit
-v, --version show program's version number and exit
-n NUMBER_OF_LEVELS, --number-of-levels NUMBER_OF_LEVELS
Crawling depth
NOTE 1 - The only mandatory parameter is the URL to crawl.
NOTE 2 - If number of levels is not provided, by default, the NUMBER_OF_LEVELS is 3.
3 - OUTPUT FORMAT
-----------------
arsespyder Web crawler will dump those links up to the depth crawling level specified, in the next format:
$ ./arsespyder.py -n3 http://arsespyder.dyndns.org/index.html
* http://arsespyder.dyndns.org/test/l1_p1.html
* http://arsespyder.dyndns.org/test/l1_p2.html
* http://arsespyder.dyndns.org/test/l1_p3.html
** http://arsespyder.dyndns.org/test/l2_p1_p1.html
** http://arsespyder.dyndns.org/test/l2_p1_p2.html
** http://arsespyder.dyndns.org/test/l2_p2_p1.html
** http://arsespyder.dyndns.org/test/l2_p2_p2.html
*** http://arsespyder.dyndns.org/l3_p1_p1_p1.html
*** http://arsespyder.dyndns.org/l3_p1_p1_p2.html
*** http://arsespyder.dyndns.org/l3_p1_p1_p3.html
*** http://arsespyder.dyndns.org/l3_p1_p2_p1.html
*** http://arsespyder.dyndns.org/l3_p1_p2_p2.html
*** http://arsespyder.dyndns.org/l3_p1_p2_p3.html
Where:
* http://... are level 1 links (existing in HTML code URL specified parameter)
** http://... are level 2 links (existing in HTML code of level 1 links)
*** http://... are level 3 links (existing in HTML code of level 2 links)
4 - CODE DOCUMENTATION
----------------------
Code documentation is contained under "doc" folder. Main .html file is pyarsespyder.html:
$ tree doc/
doc/
├── pyarsespyder.geturl.html
├── pyarsespyder.html
└── pyarsespyder.validateurl.html
5 - INSTALATION
---------------
Check INSTALL file
----------------
The program "arsespyder" is a Web Crawler that enables crawling a URL and inspect the links contained in that web recursively, so that for a provided URL, the crawler downloads the links of type <a href="http://whatever">whatever</a> contained on it.
For each of them the operations is repeated, until the level of search provided by parameter is achieved. If not provided, the crawling depth levels is, by default, 3.
2 - APPLICATION USAGE
---------------------
Usage of the application is as follows:
$ ./arsespyder.py --help
usage: arsespyder.py [-h] [-v] [-n NUMBER_OF_LEVELS] url
Internet Crawler
positional arguments:
url URL to crawl
optional arguments:
-h, --help show this help message and exit
-v, --version show program's version number and exit
-n NUMBER_OF_LEVELS, --number-of-levels NUMBER_OF_LEVELS
Crawling depth
NOTE 1 - The only mandatory parameter is the URL to crawl.
NOTE 2 - If number of levels is not provided, by default, the NUMBER_OF_LEVELS is 3.
3 - OUTPUT FORMAT
-----------------
arsespyder Web crawler will dump those links up to the depth crawling level specified, in the next format:
$ ./arsespyder.py -n3 http://arsespyder.dyndns.org/index.html
* http://arsespyder.dyndns.org/test/l1_p1.html
* http://arsespyder.dyndns.org/test/l1_p2.html
* http://arsespyder.dyndns.org/test/l1_p3.html
** http://arsespyder.dyndns.org/test/l2_p1_p1.html
** http://arsespyder.dyndns.org/test/l2_p1_p2.html
** http://arsespyder.dyndns.org/test/l2_p2_p1.html
** http://arsespyder.dyndns.org/test/l2_p2_p2.html
*** http://arsespyder.dyndns.org/l3_p1_p1_p1.html
*** http://arsespyder.dyndns.org/l3_p1_p1_p2.html
*** http://arsespyder.dyndns.org/l3_p1_p1_p3.html
*** http://arsespyder.dyndns.org/l3_p1_p2_p1.html
*** http://arsespyder.dyndns.org/l3_p1_p2_p2.html
*** http://arsespyder.dyndns.org/l3_p1_p2_p3.html
Where:
* http://... are level 1 links (existing in HTML code URL specified parameter)
** http://... are level 2 links (existing in HTML code of level 1 links)
*** http://... are level 3 links (existing in HTML code of level 2 links)
4 - CODE DOCUMENTATION
----------------------
Code documentation is contained under "doc" folder. Main .html file is pyarsespyder.html:
$ tree doc/
doc/
├── pyarsespyder.geturl.html
├── pyarsespyder.html
└── pyarsespyder.validateurl.html
5 - INSTALATION
---------------
Check INSTALL file
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
arsespyder-0.0.3.tar.gz
(3.9 kB
view details)
Built Distribution
File details
Details for the file arsespyder-0.0.3.tar.gz
.
File metadata
- Download URL: arsespyder-0.0.3.tar.gz
- Upload date:
- Size: 3.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0eb350cdfb269ae2fa8194dc670af1c9861c3728e002ad60f4a9ad2925125e47 |
|
MD5 | 7bfbd2e3babd2f727756a60db0c73cf0 |
|
BLAKE2b-256 | ddb7886449d124d143ea74ef0a17ac0d2e769f2148aa78add72d5254d9c5fc02 |
File details
Details for the file arsespyder-0.0.3.linux-i686.tar.gz
.
File metadata
- Download URL: arsespyder-0.0.3.linux-i686.tar.gz
- Upload date:
- Size: 4.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ea199dbfcf29bf4a8cb47cdc9d978ecbd2d2b3c25884ff8a2e4279d4d3e11ed8 |
|
MD5 | f078c8570d418d9ba7062d3df3e5663e |
|
BLAKE2b-256 | db37531bff83dc44607da52963b2aaebeec79116653b6167d94e4320871b81e0 |