Skip to main content

Scrapy extension for outputting scraped items to an Amazon SQS instance

Project description

Build Status Coveralls Status Requirements Status

scrapy-sqs-exporter

This is an extension to Scrapy to allow exporting of scraped items to an Amazon SQS instance.

Setup

After installing the package, the two classes defined in the library need to be added to the relevant sections of the settings file:

FEED_EXPORTERS = {
  'sqs': 'sqsfeedexport.SQSExporter'
}

FEED_STORAGES = {
  'sqs': 'sqsfeedexport.SQSFeedStorage'
}

The FEED_STORAGES section uses a URL prefixed with sqs to differentiate it from other URI based storage options.

In the environment we also need to define some keys:

AWS_DEFAULT_REGION=eu-central-1
AWS_ACCESS_KEY_ID=...
AWS_SECRET_ACCESS_KEY=...
FEED_URI=sqs://foo
FEED_FORMAT=sqs

The AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY are the AWS credentials to be used, and AWS_DEFAULT_REGION is the region to default to for the SQS instance. FEED_URI is the name of the AWS SQS instance in the AWS_DEFAULT_REGION region for example:

AWS_DEFAULT_REGION=us-east-1
FEED_URI=sqs://bar
FEED_FORMAT=sqs

would refer to a queue name bar in the us-east-1 region.

Finally, the FEED_FORMAT option makes the Scrapy spiders use the SQSExporter class.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
scrapy_sqs_exporter-1.1.0-py2.py3-none-any.whl (3.4 kB) Copy SHA256 hash SHA256 Wheel py2.py3 Jun 11, 2018
scrapy-sqs-exporter-1.1.0.tar.gz (3.3 kB) Copy SHA256 hash SHA256 Source None Jun 11, 2018

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page