Skip to main content

Patu is a small spider

Project description

Patu

A small spider, useful for checking a site for 404s and 500s. Patu requires httplib2 and lxml:

pip install -U httplib2 lxml

Quick Usage

To see available options:

patu.py –help

To spider an entire site using 5 workers, only showing errors:

patu.py –spiders=5 www.example.com

To spider, stopping after the first level of links:

patu.py –depth=1 www.example.com

To get a list of every linked page on a site:

patu.py –generate www.example.com > urls.txt

Instead of spidering for URLs, use a file instead and show all responses:

patu.py –input=urls.txt –verbose www.example.com

Format of URLs File

The output produced by <code>–generate</code> is formatted like so:

FIRST_URL<TAB>None LINK1<TAB>REFERER LINK2<TAB>REFERER

<code>–input</code> can take a file of that format, or one URL per line with no referer. <code>–input=-</code> reads from stdin.

Testing

Patu uses Nose for testing. To install Nose and test:

pip install -U nose nosetests

Project details


Release history Release notifications | RSS feed

This version

0.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

patu-0.1.tar.gz (9.1 kB view details)

Uploaded Source

File details

Details for the file patu-0.1.tar.gz.

File metadata

  • Download URL: patu-0.1.tar.gz
  • Upload date:
  • Size: 9.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for patu-0.1.tar.gz
Algorithm Hash digest
SHA256 3d57674b597576f66602e1e8a80339fab9d629b587c8bdae2d69db541c19e472
MD5 920fa7869446408bf1641dc98be1ee64
BLAKE2b-256 43d445fbd3af3708aea8cf1a41a5cab2572c2db9aadd57506f3f6e3e1b3bfe82

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page