Skip to main content

Palu is a small spider, a forked of patu.

Project description

Palu

A small spider, useful for checking a site for 404s and 500s. It’s a forked of [Patu][1].Palu requires httplib2 and lxml:

pip install -U httplib2 lxml

Is it safe? [![Build Status](https://secure.travis-ci.org/akrito/palu.png?branch=master)](http://travis-ci.org/akrito/palu)

Quick Usage

To see available options:

palu.py –help

To spider an entire site using 5 workers, only showing errors:

palu.py –spiders=5 www.example.com

To spider, stopping after the first level of links:

palu.py –depth=1 www.example.com

To get a list of every linked page on a site:

palu.py –generate www.example.com > urls.txt

Instead of spidering for URLs, use a file instead and show all responses:

palu.py –input=urls.txt –verbose www.example.com

Format of URLs File

The output produced by <code>–generate</code> is formatted like so:

FIRST_URL<TAB>None LINK1<TAB>REFERER LINK2<TAB>REFERER

<code>–input</code> can take a file of that format, or one URL per line with no referer. <code>–input=-</code> reads from stdin.

Testing

Palu uses Nose for testing. To install Nose and test:

pip install -U nose nosetests

[1]:https://pypi.python.org/pypi/patu

Project details


Release history Release notifications | RSS feed

This version

0.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

palu-0.1.tar.gz (9.4 kB view details)

Uploaded Source

File details

Details for the file palu-0.1.tar.gz.

File metadata

  • Download URL: palu-0.1.tar.gz
  • Upload date:
  • Size: 9.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for palu-0.1.tar.gz
Algorithm Hash digest
SHA256 1173921168bf427495e2d431ec50522bf80240d42c6872c3213d08abb7defa74
MD5 27e0d848f3f1fa580f1c1158236820e1
BLAKE2b-256 42e0dee0cd6f7486c0adef86a8db0558b6aa70943b06e6487d46376a62737bba

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page