Skip to main content
Join the official 2019 Python Developers SurveyStart the survey!

An RSS Exporter for Scrapy

Project description

Generate an RSS feed using the Scrapy framework.

Installation

  • Install scrapy-rss-exporter using pip:

    pip install scrapy-rss-exporter
    
  • or using setuptools:

    python setup.py install
    

Usage

Feed Items

The most convenient way to use the exporter is to return the objects of RssItem class from your spiders. This class derives from scrapy.Item, so it will work with other exporters as well.

You will need to set the following keys:

from scrapy_rss_exporter.items import RssItem, Enclosure

rss_item = RssItem()
rss_item['title'] = 'Item title'
rss_item['link'] = 'Item url'
rss_item['guid'] = 'Item ID'
rss_item['description'] = 'Item Description'
rss_item['pub_date'] = None
rss_item['enclosure'] = [Enclosure(url=img, type='image/jpeg')]

The pub_date field should contain a date in the RFC882 format. If you use None, the system will insert the current date in the appropriate format. The enclosure field is optional and should contain a (possibly empty) list of Enclosure objects.

Global Exporter

To set the exporter up globally, you need to declare it in the FEED_EXPORTERS dictionary in the settings.py file:

FEED_EXPORTERS = {
  'rss': 'scrapy_rss_exporter.exporters.RssItemExporter'
}

You can then use it as a FEED_FORMAT and specify the output file in the FEED_URI:

FEED_FORMAT = 'rss'
FEED_URI = 's3://my-feeds/my-feed.rss'

Note: Bear in mind that, if you use a local file as output, scrapy will append to an existing file resulting with an invalid RSS code. You should, therefore, make sure to delete any existing output file before running the spider. The s3 storage does not have this problem because scrapy uploads are using the S3 PutObject method.

scrapy does not seem to allow to push any configuration option to an exporter. Therefore, if you want to customize the feed title and other metadata, you need to create a subclass and update the FEED_EXPORTERS dictionary with the new class name:

class MyRssExporter(RssItemExporter):
    def __init__(self, *args, **kwargs):
        kwargs['title'] = 'My RSS'
        kwargs['link'] = 'https://www.mywebsite.com'
        kwargs['description'] = 'My RSS Items'
        super(MyRssExporter, self).__init__(*args, **kwargs)

Per Spider Exporter

You can, of course, specify a different exporter with different settings for each spider. Just use the custom_settings field to override the global configuration fields:

class MySpider(scrapy.Spider):
    name = "my"
    start_urls = ['https://www.mywebsite.com']
    custom_settings = {
        'FEED_EXPORTERS': {'rss': 'project.spiders.my_spider.MyExporter'},
        'FEED_FORMAT': 'rss',
        'FEED_URI': 's3://my-feeds/my-feed.rss',
    }

    def parse(self, response):
        pass

Project details


Release history Release notifications

This version

0.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for scrapy-rss-exporter, version 0.1
Filename, size File type Python version Upload date Hashes
Filename, size scrapy_rss_exporter-0.1-py2.py3-none-any.whl (6.7 kB) File type Wheel Python version py2.py3 Upload date Hashes View hashes
Filename, size scrapy-rss-exporter-0.1.tar.gz (3.9 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page