Skip to main content

An RSS Exporter for Scrapy

Project description

PyPI Version

Generate an RSS feed using the Scrapy framework.

Table of Contents

Installation

  • Install scrapy-rss-exporter using pip:

    pip install scrapy-rss-exporter
  • or using setuptools:

    python setup.py install

Usage

Feed Items

The most convenient way to use the exporter is to return the objects of RssItem class from your spiders. This class derives from scrapy.Item, so it will work with other exporters as well.

You will need to set the following keys:

from scrapy_rss_exporter.items import RssItem, Enclosure

rss_item = RssItem()
rss_item['title'] = 'Item title'
rss_item['link'] = 'Item url'
rss_item['guid'] = 'Item ID'
rss_item['description'] = 'Item Description'
rss_item['pub_date'] = None
rss_item['enclosure'] = [Enclosure(url=img, type='image/jpeg')]

The pub_date field should contain a date in the RFC882 format. If you use None, the system will insert the current date in the appropriate format. The enclosure field is optional and should contain a (possibly empty) list of Enclosure objects.

Global Exporter

To set the exporter up globally, you need to declare it in the FEED_EXPORTERS dictionary in the settings.py file:

FEED_EXPORTERS = {
  'rss': 'scrapy_rss_exporter.exporters.RssItemExporter'
}

You can then use it as a FEED_FORMAT and specify the output file in the FEED_URI:

FEED_FORMAT = 'rss'
FEED_URI = 's3://my-feeds/my-feed.rss'

Note: Bear in mind that, if you use a local file as output, scrapy will append to an existing file resulting with an invalid RSS code. You should, therefore, make sure to delete any existing output file before running the spider. The s3 storage does not have this problem because scrapy uploads are using the S3 PutObject method.

scrapy does not seem to allow to push any configuration option to an exporter. Therefore, if you want to customize the feed title and other metadata, you need to create a subclass and update the FEED_EXPORTERS dictionary with the new class name:

class MyRssExporter(RssItemExporter):
    def __init__(self, *args, **kwargs):
        kwargs['title'] = 'My RSS'
        kwargs['link'] = 'https://www.mywebsite.com'
        kwargs['description'] = 'My RSS Items'
        super(MyRssExporter, self).__init__(*args, **kwargs)

Per Spider Exporter

You can, of course, specify a different exporter with different settings for each spider. Just use the custom_settings field to override the global configuration fields:

class MySpider(scrapy.Spider):
    name = "my"
    start_urls = ['https://www.mywebsite.com']
    custom_settings = {
        'FEED_EXPORTERS': {'rss': 'project.spiders.my_spider.MyExporter'},
        'FEED_FORMAT': 'rss',
        'FEED_URI': 's3://my-feeds/my-feed.rss',
    }

    def parse(self, response):
        pass

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

scrapy_rss_exporter-0.2-py2.py3-none-any.whl (5.5 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page