Skip to main content

Python Test Crawler

Project description

Python Testing Crawler

A crawler for automated functional testing of a web application

Crawling a server-side-rendered web application is a low cost way to get low quality test coverage of your JavaScript-light web application.

If you have only partial test coverage of your routes, but still want to protect against silly mistakes, then this is for you. It follows links and can post forms.

Works with the test clients for Flask (inc Flask-WebTest), Django and Zope/WebTest.

Installation

$ pip install python-testing-crawler

Usage

Create a crawler using your framework's existing test client, tell it where to start and what rules to obey, then set it off:

from python_testing_crawler import Crawler
from python_testing_crawler import Rule, Request

def test_crawl_all():
    client = ## ... existing testing client
    ## ... any setup ...
    crawler = Crawler(
        client=my_testing_client,
        initial_paths=['/'],
        rules=[
            Rule("a", '/.*', "GET", Request()),
        ]
    )
    crawler.crawl()

This will crawl all anchor links to relative addresses beginning "/". Any exceptions encountered will be collected and presented at the end of the crawl. For more power see the Rules section below.

If you need to authorise the client's session, e.g. login, then you should that before creating the Crawler.

It is also a good idea to create enough data, via fixtures or otherwise, to expose enough endpoints.

Crawler Options

Param Description
initial_paths list of paths/URLs to start from
rules list of rules
path_attrs list of attribute names to get paths/URLs from; defaults to "href" but include "src" if you want to check e.g. <link>, <script> or even <img>
ignore_css_selectors any elements matching this list of CSS selectors will be ignored
ignore_form_fields list of form input names to ignore when determining the identity/uniqueness of a form. Include CSRF token field names here.
max_requests crawler will raise an exception if this limit is exceeded
capture_exceptions keep going on any exception and fail at the end of the crawl instead of during (default True)
should_process_handlers list of "should process" handlers; see Handlers section
check_response_handlers list of "check response" handlers; see Handlers section

Rules

The crawler has to be told what URLs to follow, what forms to post and what to ignore.

Rules are four-tuples:

(source element, URL/path, HTTP method, action to take)

These are matched against every link or form that the crawler encounters, in reverse priority order.

Supported actions:

  1. Request(only=False, params=None) -- follow a link or submit a form
  • only=True will retrieve a page but not spider its links.
  • the dict params allows you to specify overrides for a form's default values
  1. Ignore -- do nothing
  2. Allow -- allow a HTTP status code, i.e. do not consider it to be an error.

Example Rules

Follow all local/relative links

HYPERLINKS_ONLY_RULE_SET = [
    Rule(ANCHOR, '/.*', GET, Request()),
    Rule(AREA, '/.*', GET, Request()),
]

Request but do not spider all links

REQUEST_ONLY_EXTERNAL_RULE_SET = [
    Rule(ANCHOR, '.*', GET, Request(only=True)),
    Rule(AREA, '.*', GET, Request(only=True)),
]

This is useful for finding broken links. You can also check <link> tags from the <head> if you include the following rule plus set a Crawler's path_attrs to ("HREF", "SRC").

Rule(LINK, '.*', GET, Request())

Submit forms with GET or POST

SUBMIT_GET_FORMS_RULE_SET = [
    Rule(FORM, '.*', GET, Request())
]

SUBMIT_POST_FORMS_RULE_SET = [
    Rule(FORM, '.*', POST, Request())
]

Setting Request(params={...} on a specific form lets you specify what values to submit.`

Allow some routes to fail

PERMISSIVE_RULE_SET = [
    Rule('.*', '.*', GET, Allow([*range(400, 600)])),
    Rule('.*', '.*', POST, Allow([*range(400, 600)]))
]

Crawl Graph

The crawler builds up a graph of your web application. It can be interrogated via crawler.graph when the crawl is finished.

See Node in docs (TODO).

Handlers

Two hooks points are provided:

Whether to process a Node

Using should_process_handlers, you can register functions that take a Node and return a bool of whether the Crawler should "process" -- follow a link or submit a form -- or not.

Whether a response is acceptable

Using check_response_handlers, you can register functions that take a Node and response object (specific to your test client) and return a bool of whether the response should constitute an error.

If your function returns True, the Crawler with throw an exception.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

python-testing-crawler-0.2.0.tar.gz (17.8 kB view details)

Uploaded Source

Built Distribution

python_testing_crawler-0.2.0-py3-none-any.whl (31.7 kB view details)

Uploaded Python 3

File details

Details for the file python-testing-crawler-0.2.0.tar.gz.

File metadata

  • Download URL: python-testing-crawler-0.2.0.tar.gz
  • Upload date:
  • Size: 17.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.4.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.7

File hashes

Hashes for python-testing-crawler-0.2.0.tar.gz
Algorithm Hash digest
SHA256 3220d089d9de8c289867f280765454dfaf899cf2e1622c3e7dae3aece26731c9
MD5 767b6799dc919a693a9c2e9b6465262a
BLAKE2b-256 4bdb7fa528333bda1af2a08f64c78ee68ffbe0d256db1b0f7beca081b96390cf

See more details on using hashes here.

File details

Details for the file python_testing_crawler-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: python_testing_crawler-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 31.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.4.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.7

File hashes

Hashes for python_testing_crawler-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1a338da6faa4b5ff857810ea6fe575e020594d064e93a16ad3d996fb8943fecf
MD5 19d1f6977363ef1c8f8ba98af2bb6806
BLAKE2b-256 b79725c8f37d2bd9566770285d37ae7e4dc766c29d714a0b579891f05410cd9f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page