Skip to main content

Statsd integration middleware for scrapy

Project description

Usage

pip install scrapy-statsd-middleware

DOWNLOADER_MIDDLEWARES = {
  'statsd_middleware.StatsdMiddleware': 543,
}

SPIDER_MIDDLEWARES = {
  'statsd_middleware.StatsdMiddleware': 543,
}

There’s also a few settings that you can use:

  • STATSD_HOSTNAME - Defaults to the current machine’s hostname

  • STATSD_PREFIX - Defaults to “hostname.spider-name.”

  • STATSD_HOST_IP - Defaults to “0.0.0.0”

This will increment statsd with the following: * requests (spider_reqs_issued) * response (spider_resps_received) * errors (error_KeyError, where KeyError is whatever the error name is) * items processed (processed_Product, where Product is whatever the item class name is)

Example Implementation

An example implementation of this middleware is in /example It includes a docker-compose file that describes how to use this middleware with statsd & graphite

Example Installation & Usage

  • Build the docker images docker-compose build

  • Start the statsd container docker-compose up -d

  • Run the example spider: docker-compose -f ./example/docker-compose.yml run spider bash -c “cd ./opt/scrapy/dirbot/ && scrapy crawl dmoz”

You can see a live graphite dashboard at http://0.0.0.0/dashboard You should see stats show up under something like “stats.Z-MacBook-Pro.local.dmoz.spider_reqs_issued”

Development

You can run the tests via make test

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy-statsd-middleware-0.0.8.tar.gz (3.9 kB view hashes)

Uploaded Source

Built Distribution

scrapy_statsd_middleware-0.0.8-py2.py3-none-any.whl (5.3 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page