Skip to main content

A web crawler that gathers more than you can imagine.

Project description

N.Y.A.W.C. logo

Not Your Average Web Crawler

Build Status Python version PyPi version License: MIT

A very useful web crawler for vulnerability scanning. Not Your Average Web Crawler (N.Y.A.W.C) is a Python application that enables you to crawl web applications for requests instead of URLs. It crawls every GET and POST request on the specified domain and keeps track of the request and response data. It’s main purpose is to be used in web application vulnerability scanners.

Crawls:

  • Links: URLs in HTML, XML, etc.

  • Forms: GET & POST forms and their request data.

Future development:

  • Support rate limiting.

  • Support XHR/JS scraping.

  • Add other scrapers.

Table of contents

Installation

First make sure you’re on Python 2.7/3.3 or higher. Then run the command below to install N.Y.A.W.C.

$ pip install --upgrade nyawc

Crawling flow

  1. Add the start request to the queue.

  2. Start first request in the queue (repeat until ``max threads`` option reached).

  3. Add all requests found in the response to the queue (except duplicates).

  4. Go to step #2 again to spawn new requests.

N.Y.A.W.C crawling flow

Please note that if the queue is empty and all crawler threads are finished, the crawler will stop.

Documentation

Please refer to the documentation or the API for all the information about N.Y.A.W.C.

Minimal implementation

You can use the callbacks in example_minimal.py to run your own exploit against the requests. If you want an example of automated exploit scanning, please take a look at Angular CSTI scanner (it uses N.Y.A.W.C to scan for the AngularJS sandbox escape vulnerability).

You can also use the kitchen sink (which contains all the functionalities from N.Y.A.W.C.) instead of the example below. The code below is a minimal implementation of N.Y.A.W.C.

  • $ python example_minimal.py

  • $ python -u example_minimal.py > output.log

# example_minimal.py

from nyawc.Options import Options
from nyawc.Crawler import Crawler
from nyawc.CrawlerActions import CrawlerActions
from nyawc.http.Request import Request

def cb_crawler_before_start():
    print("Crawler started.")

def cb_crawler_after_finish(queue):
    print("Crawler finished.")
    print("Found " + str(len(queue.get_all(QueueItem.STATUS_FINISHED))) + " requests.")

def cb_request_before_start(queue, queue_item):
    print("Starting: {}".format(queue_item.request.url))
    return CrawlerActions.DO_CONTINUE_CRAWLING

def cb_request_after_finish(queue, queue_item, new_queue_items):
    print("Finished: {}".format(queue_item.request.url))
    return CrawlerActions.DO_CONTINUE_CRAWLING

options = Options()

options.callbacks.crawler_before_start = cb_crawler_before_start # Called before the crawler starts crawling. Default is a null route.
options.callbacks.crawler_after_finish = cb_crawler_after_finish # Called after the crawler finished crawling. Default is a null route.
options.callbacks.request_before_start = cb_request_before_start # Called before the crawler starts a new request. Default is a null route.
options.callbacks.request_after_finish = cb_request_after_finish # Called after the crawler finishes a request. Default is a null route.

crawler = Crawler(options)
crawler.start_with(Request("https://finnwea.com/"))

Testing

The testing can and will automatically be done by Travis CI on every push to the master branch. If you want to manually run the unit tests, use the command below.

$ python -m unittest discover

Issues

Issues or new features can be reported via the GitHub issue tracker. Please make sure your issue or feature has not yet been reported by anyone else before submitting a new one.

License

Not Your Average Web Crawler (N.Y.A.W.C) is open-sourced software licensed under the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nyawc-1.7.0.tar.gz (24.1 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page