Skip to main content

Web Spider to retrieve links

Project description

1 - INTRODUCTION
----------------
The program "arsespyder" is a Web Crawler that enables crawling a URL and inspect the links contained in that web recursively, so that for a provided URL, the crawler downloads the links of type <a href="http://whatever">whatever</a> contained on it.
For each of them the operations is repeated, until the level of search provided by parameter is achieved. If not provided, the crawling depth levels is, by default, 3.

2 - APPLICATION USAGE
---------------------
Usage of the application is as follows:

$ ./arsespyder.py --help
usage: arsespyder.py [-h] [-v] [-n NUMBER_OF_LEVELS] url

Internet Crawler

positional arguments:
url URL to crawl

optional arguments:
-h, --help show this help message and exit
-v, --version show program's version number and exit
-n NUMBER_OF_LEVELS, --number-of-levels NUMBER_OF_LEVELS
Crawling depth

NOTE 1 - The only mandatory parameter is the URL to crawl.
NOTE 2 - If number of levels is not provided, by default, the NUMBER_OF_LEVELS is 3.

3 - OUTPUT FORMAT
-----------------
arsespyder Web crawler will dump those links up to the depth crawling level specified, in the next format:

$ ./arsespyder.py -n3 http://arsespyder.dyndns.org/index.html
* http://arsespyder.dyndns.org/test/l1_p1.html
* http://arsespyder.dyndns.org/test/l1_p2.html
* http://arsespyder.dyndns.org/test/l1_p3.html
** http://arsespyder.dyndns.org/test/l2_p1_p1.html
** http://arsespyder.dyndns.org/test/l2_p1_p2.html
** http://arsespyder.dyndns.org/test/l2_p2_p1.html
** http://arsespyder.dyndns.org/test/l2_p2_p2.html
*** http://arsespyder.dyndns.org/l3_p1_p1_p1.html
*** http://arsespyder.dyndns.org/l3_p1_p1_p2.html
*** http://arsespyder.dyndns.org/l3_p1_p1_p3.html
*** http://arsespyder.dyndns.org/l3_p1_p2_p1.html
*** http://arsespyder.dyndns.org/l3_p1_p2_p2.html
*** http://arsespyder.dyndns.org/l3_p1_p2_p3.html

Where:
* http://... are level 1 links (existing in HTML code URL specified parameter)
** http://... are level 2 links (existing in HTML code of level 1 links)
*** http://... are level 3 links (existing in HTML code of level 2 links)

4 - CODE DOCUMENTATION
----------------------
Code documentation is contained under "doc" folder. Main .html file is pyarsespyder.html:

$ tree doc/
doc/
├── pyarsespyder.geturl.html
├── pyarsespyder.html
└── pyarsespyder.validateurl.html

5 - INSTALATION
---------------
Check INSTALL file

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arsespyder-0.0.3.tar.gz (3.9 kB view details)

Uploaded Source

Built Distribution

arsespyder-0.0.3.linux-i686.tar.gz (4.5 kB view details)

Uploaded Source

File details

Details for the file arsespyder-0.0.3.tar.gz.

File metadata

  • Download URL: arsespyder-0.0.3.tar.gz
  • Upload date:
  • Size: 3.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for arsespyder-0.0.3.tar.gz
Algorithm Hash digest
SHA256 0eb350cdfb269ae2fa8194dc670af1c9861c3728e002ad60f4a9ad2925125e47
MD5 7bfbd2e3babd2f727756a60db0c73cf0
BLAKE2b-256 ddb7886449d124d143ea74ef0a17ac0d2e769f2148aa78add72d5254d9c5fc02

See more details on using hashes here.

File details

Details for the file arsespyder-0.0.3.linux-i686.tar.gz.

File metadata

File hashes

Hashes for arsespyder-0.0.3.linux-i686.tar.gz
Algorithm Hash digest
SHA256 ea199dbfcf29bf4a8cb47cdc9d978ecbd2d2b3c25884ff8a2e4279d4d3e11ed8
MD5 f078c8570d418d9ba7062d3df3e5663e
BLAKE2b-256 db37531bff83dc44607da52963b2aaebeec79116653b6167d94e4320871b81e0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page