Skip to main content

RSS Tools for Scrapy Framework

Project description

PyPI Version Wheel Status Testing status Coverage report Supported python versions

Tools to easy generate RSS feed that contains each scraped item using Scrapy framework.

Table of Contents

Installation

  • Install scrapy_rss using pip

    pip install scrapy_rss

    or using pip for the specific interpreter, e.g.:

    pip3 install scrapy_rss
  • or using setuptools directly:

    cd path/to/root/of/scrapy_rss
    python setup.py install

    or using setuptools for specific interpreter, e.g.:

    cd path/to/root/of/scrapy_rss
    python3 setup.py install

How To Use

Configuration

Add parameters to the Scrapy project settings (settings.py file) or to the custom_settings attribute of the spider:

  1. Add item pipeline that export items to rss feed:

    ITEM_PIPELINES = {
        # ...
        'scrapy_rss.pipelines.FeedExportPipeline': 900,  # or another priority
        # ...
    }
  2. Add required feed parameters:

    FEED_FILE

    the absolute or relative file path where the result RSS feed will be saved. For example, feed.rss or output/feed.rss.

    FEED_TITLE

    the name of the channel (feed),

    FEED_DESCRIPTION

    the phrase or sentence that describes the channel (feed),

    FEED_LINK

    the URL to the HTML website corresponding to the channel (feed)

    FEED_FILE = 'path/to/feed.rss'
    FEED_TITLE = 'Some title of the channel'
    FEED_LINK = 'http://example.com/rss'
    FEED_DESCRIPTION = 'About channel'

Usage

Basic usage

Declare your item directly as RssItem():

import scrapy_rss

item1 = scrapy_rss.RssItem()

Or use predefined item class RssedItem with RSS field named as rss that’s instance of RssItem:

import scrapy
import scrapy_rss

class MyItem(scrapy_rss.RssedItem):
    field1 = scrapy.Field()
    field2 = scrapy.Field()
    # ...

item2 = MyItem()

Set/get item fields. Case sensitive attributes of RssItem() are appropriate to RSS elements. Attributes of RSS elements are case sensitive too. If the editor allows autocompletion then it suggests attributes for instances of RssedItem and RssItem. It’s allowed to set any subset of RSS elements (e.g. title only). For example:

from datetime import datetime

item1.title = 'RSS item title'  # set value of <title> element
title = item1.title.value  # get value of <title> element
item1.description = 'description'

item1.guid = 'item identifier'
item1.guid.isPermaLink = False  # set value of attribute isPermalink of <guid> element,
                                # isPermaLink is True by default
is_permalink = item1.guid.isPermaLink  # get value of attribute isPermalink of <guid> element
guid = item1.guid.value  # get value of element <guid>

item1.category = 'single category'
category = item1.category
item1.category = ['first category', 'second category']
first_category = item1.category[0].value # get value of the element <category> with multiple values
all_categories = [cat.value for cat in item1.category]

# direct attributes setting
item1.enclosure.url = 'http://example.com/file'
item1.enclosure.length = 0
item1.enclosure.type = 'text/plain'

# or dict based attributes setting
item1.enclosure = {'url': 'http://example.com/file', 'length': 0, 'type': 'text/plain'}
item1.guid = {'value': 'item identifier', 'isPermaLink': True}

item1.pubDate = datetime.now()  # correctly works with Python' datetimes


item2.rss.title = 'Item title'
item2.rss.guid = 'identifier'
item2.rss.enclosure = {'url': 'http://example.com/file', 'length': 0, 'type': 'text/plain'}

All allowed elements are listed in the scrapy_rss/items.py. All allowed attributes of each element with constraints and default values are listed in the scrapy_rss/elements.py. Also you can read RSS specification for more details.

RssItem derivation and namespaces

You can extend RssItem to add new XML fields that can be namespaced or not. You can specify namespaces in an attribute and/or an element constructors. Namespace prefix can be specified in the attribute/element name using double underscores as delimiter (prefix__name) or in the attribute/element constructor using ns_prefix argument. Namespace URI can be specified using ns_uri argument of the constructor.

from scrapy_rss.meta import ElementAttribute, Element
from scrapy_rss.items import RssItem

class Element0(Element):
    # attributes without special namespace
    attr0 = ElementAttribute(is_content=True, required=True)
    attr1 = ElementAttribute()

class Element1(Element):
    # attribute "prefix2:attr2" with namespace xmlns:prefix2="id2"
    attr2 = ElementAttribute(ns_prefix="prefix2", ns_uri="id2")

    # attribute "prefix3:attr3" with namespace xmlns:prefix3="id3"
    prefix3__attr3 = ElementAttribute(ns_uri="id3")

    # attribute "prefix4:attr4" with namespace xmlns:prefix4="id4"
    fake_prefix__attr4 = ElementAttribute(ns_prefix="prefix4", ns_uri="id4")

    # attribute "attr5" with default namespace xmlns="id5"
    attr5 = ElementAttribute(ns_uri="id5")

class MyXMLItem(RssItem):
    # element <elem1> without namespace
    elem1 = Element0()

    # element <elem_prefix2:elem2> with namespace xmlns:elem_prefix2="id2e"
    elem2 = Element0(ns_prefix="elem_prefix2", ns_uri="id2e")

    # element <elem_prefix3:elem3> with namespace xmlns:elem_prefix3="id3e"
    elem_prefix3__elem3 = Element1(ns_uri="id3e")

    # yet another element <elem_prefix4:elem3> with namespace xmlns:elem_prefix4="id4e"
    # (does not conflict with previous one)
    fake_prefix__elem3 = Element0(ns_prefix="elem_prefix4", ns_uri="id4e")

    # element <elem5> with default namespace xmlns="id5e"
    elem5 = Element0(ns_uri="id5e")

Access to elements and its attributes is the same as with simple items:

item = MyXMLItem()
item.title = 'Some title'
item.elem1.attr0 = 'Required content value'
item.elem1 = 'Another way to set content value'
item.elem1.attr1 = 'Some attribute value'
item.elem_prefix3__elem3.prefix3__attr3 = 'Yet another attribute value'
item.elem_prefix3__elem3.fake_prefix__attr4 = '' # non-None value is interpreted as assigned
item.fake_prefix__elem3.attr1 = 42

Several optional settings are allowed for namespaced items:

FEED_NAMESPACES

list of tuples [(prefix, URI), ...] or dictionary {prefix: URI, ...} of namespaces that must be defined in the root XML element

FEED_ITEM_CLASS or FEED_ITEM_CLS

main class of feed items (class object MyXMLItem or path to class "path.to.MyXMLItem"). Default value: RssItem. It’s used in order to extract all possible namespaces that will be declared in the root XML element.

Feed items do NOT have to be instances of this class or its subclass.

If these settings are not defined or only part of namespaces are defined then other used namespaces will be declared either in the <item> element or in its subelements when these namespaces are not unique. Each <item> element and its sublements always contains only namespace declarations of non-None attributes (including ones that are interpreted as element content).

Feed (Channel) Elements Customization [optionally]

If you want to change other channel parameters (such as language, copyright, managingEditor, webMaster, pubDate, lastBuildDate, category, generator, docs, cloud, ttl, image, rating, textInput, skipHours, skipDays) then define your own exporter that’s inherited from FeedItemExporter class and, for example, modify one or more children of self.channel Element (camelCase attributes naming):

from datetime import datetime
from scrapy_rss.rss import channel_elements
from scrapy_rss.exporters import FeedItemExporter

class MyRssItemExporter(FeedItemExporter):
   def __init__(self, *args, **kwargs):
      super(MyRssItemExporter, self).__init__(*args, **kwargs)
      self.channel.generator = 'Special generator'
      self.channel.language = 'en-us'
      self.channel.managingEditor = 'editor@example.com'
      self.channel.webMaster = 'webmaster@example.com'
      self.channel.copyright = 'Copyright 2025'
      self.channel.pubDate = datetime(2025, 9, 10, 13, 0, 0)

      self.channel.category = ['category 1', 'category 2']
      self.channel.category.append('category 3')
      self.channel.category.extend(['category 4', 'category 5'])

      # initialize image from dict
      self.channel.image = {
          'url': 'https://example.com/img.jpg',
          'description': 'Image link hover text',
      }
      # or initialize image from ImageElement
      self.channel.image = channel_elements.ImageElement(url='https://example.com/img.jpg')
      # or initialize image by each attribute
      self.channel.image.url = 'https://example.com/img.jpg' # required attribute of image
      self.channel.image.title = 'Image title' # optional
      self.channel.image.link = 'https://example.com/page' # optional
      self.channel.image.description = 'Image link hover text' # optional
      self.channel.image.width = 140 # optional
      self.channel.image.height = 350 # optional

      self.channel.docs = 'https://example.com/rss_docs'
      self.channel.cloud = {
          'domain': 'rpc.sys.com',
          'port': '80',
          'path': '/RPC2',
          'registerProcedure': 'myCloud.rssPleaseNotify',
          'protocol': 'xml-rpc'
      }
      self.channel.ttl = 60
      self.channel.rating = 4.0
      self.channel.textInput = channel_elements.TextInputElement(
          title='Input title',
          description='Description of input',
          name='Input name',
          link='http://example.com/cgi.py'
      )

      self.channel.skipHours = (0, 1, 3, 7, 23) # initialize list from iterable
      self.channel.skipHours = 12 # or initialize list with single value

      self.channel.skipDays = 14 # initialize list with single value
      self.channel.skipDays = [1, 14] # or initialize list from list

or modify kwargs arguments (snake_case arguments naming):

from scrapy_rss.exporters import FeedItemExporter

class MyRssItemExporter(FeedItemExporter):
   def __init__(self, *args, **kwargs):
      kwargs['generator'] = kwargs.get('generator', 'Special generator')
      kwargs['language'] = kwargs.get('language', 'en-us')
      kwargs['managing_editor'] = kwargs.get('managing_editor', 'editor@example.com')
      kwargs['managing_editor'] = kwargs.get('managing_editor', ('category 1', 'category 2'))
      kwargs['image'] = kwargs.get('image', {'url': 'https://example.com/img.jpg'})
      # etc.
      super(MyRssItemExporter, self).__init__(*args, **kwargs)

And add FEED_EXPORTER parameter to the Scrapy project settings or to the custom_settings attribute of the spider:

FEED_EXPORTER = 'myproject.exporters.MyRssItemExporter'

Backward compatibility notices

Since version 1.0.0 some classes have been renamed, but old-named classes have been kept and marked as deprecated for backward compatibility, so they can still be used.

But some elements of RssItem have some their attributes renamed in a backward incompatible way: almost all content attributes (text content of XML tag after exporting) are renamed to value to enhance code readability.

So if you do not want update your code expressions (such as an old-style item.title.title to a new-style item.title.value or item.guid.guid to item.guid.value) then you can easily import old-style classes

# old-style classes
from scrapy_rss.rss.old.items import RssItem, RssedItem

instead of new-style ones

# new-style classes
from scrapy_rss.items import RssItem, RssedItem

respectively.

Scrapy Project Examples

Examples directory contains several Scrapy projects with the scrapy_rss usage demonstration. It crawls this website whose source code is here.

Just go to the Scrapy project directory and run commands

scrapy crawl first_spider
scrapy crawl second_spider

Thereafter feed.rss and feed2.rss files will be created in the same directory.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy_rss-1.1.0.tar.gz (299.9 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

scrapy_rss-1.1.0-py314-none-any.whl (30.8 kB view details)

Uploaded Python 3.14

scrapy_rss-1.1.0-py313-none-any.whl (30.8 kB view details)

Uploaded Python 3.13

scrapy_rss-1.1.0-py312-none-any.whl (30.8 kB view details)

Uploaded Python 3.12

scrapy_rss-1.1.0-py311-none-any.whl (30.8 kB view details)

Uploaded Python 3.11

scrapy_rss-1.1.0-py310-none-any.whl (30.8 kB view details)

Uploaded Python 3.10

scrapy_rss-1.1.0-py39-none-any.whl (30.8 kB view details)

Uploaded Python 3.9

scrapy_rss-1.1.0-py38-none-any.whl (30.8 kB view details)

Uploaded Python 3.8

scrapy_rss-1.1.0-py37-none-any.whl (30.8 kB view details)

Uploaded Python 3.7

scrapy_rss-1.1.0-py36-none-any.whl (30.8 kB view details)

Uploaded Python 3.6

scrapy_rss-1.1.0-py35-none-any.whl (30.8 kB view details)

Uploaded Python 3.5

scrapy_rss-1.1.0-py34-none-any.whl (30.8 kB view details)

Uploaded Python 3.4

scrapy_rss-1.1.0-py33-none-any.whl (35.6 kB view details)

Uploaded Python 3.3

scrapy_rss-1.1.0-py27-none-any.whl (30.8 kB view details)

Uploaded Python 2.7

File details

Details for the file scrapy_rss-1.1.0.tar.gz.

File metadata

  • Download URL: scrapy_rss-1.1.0.tar.gz
  • Upload date:
  • Size: 299.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for scrapy_rss-1.1.0.tar.gz
Algorithm Hash digest
SHA256 e49979ac063467ea89b6417c2fb34664aa873544fe462f5d413803d120639656
MD5 82b7b7f7234443c1683a12d240b7ded5
BLAKE2b-256 24daff7075c8347807fa87f80609dd28dcc2c1bc82b81617a1ce537eaab21f5b

See more details on using hashes here.

File details

Details for the file scrapy_rss-1.1.0-py314-none-any.whl.

File metadata

  • Download URL: scrapy_rss-1.1.0-py314-none-any.whl
  • Upload date:
  • Size: 30.8 kB
  • Tags: Python 3.14
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for scrapy_rss-1.1.0-py314-none-any.whl
Algorithm Hash digest
SHA256 0ecedc9873755216ac23d9f06a2cbeec362cc8cbf02b1df2d5fe23c547673175
MD5 e323c7ff491adfe9735a5c1df4770d50
BLAKE2b-256 7fed867983a73b6ea916c08b1854b6452f10219d98a7bf97933b86bc374b4223

See more details on using hashes here.

File details

Details for the file scrapy_rss-1.1.0-py313-none-any.whl.

File metadata

  • Download URL: scrapy_rss-1.1.0-py313-none-any.whl
  • Upload date:
  • Size: 30.8 kB
  • Tags: Python 3.13
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for scrapy_rss-1.1.0-py313-none-any.whl
Algorithm Hash digest
SHA256 df7bb4ab7fadceba4efe1d91649e3eaf5b5f151b1bf8946816fb15e27646c9d8
MD5 4bfbc8f0c17e20be55ae7cdbb37a05af
BLAKE2b-256 a4294b0f1675155cbe3ef838c7c1a0cea062fd497447e3c97554361ba355b5f1

See more details on using hashes here.

File details

Details for the file scrapy_rss-1.1.0-py312-none-any.whl.

File metadata

  • Download URL: scrapy_rss-1.1.0-py312-none-any.whl
  • Upload date:
  • Size: 30.8 kB
  • Tags: Python 3.12
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for scrapy_rss-1.1.0-py312-none-any.whl
Algorithm Hash digest
SHA256 4b8b2c44a8d4464ae7426ebdd54145ed33d41b3ab62b2ffc7a83af29b2817a8c
MD5 6e2378cc1727a2335d1f8eb3c35a00ed
BLAKE2b-256 5a2ccd85776f9b19dc228e354cc8e93af8979add36fb6313dcf8b9c33f60868e

See more details on using hashes here.

File details

Details for the file scrapy_rss-1.1.0-py311-none-any.whl.

File metadata

  • Download URL: scrapy_rss-1.1.0-py311-none-any.whl
  • Upload date:
  • Size: 30.8 kB
  • Tags: Python 3.11
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for scrapy_rss-1.1.0-py311-none-any.whl
Algorithm Hash digest
SHA256 051111265def1ea802c8335424d59f8ac479fdd874fccf2b2be138165f9a5643
MD5 0df50f330bfd101208779f5073983860
BLAKE2b-256 aabea7e13dffe428edfa5beef6977d616f6c1603f4bbf01b7162532a617598ab

See more details on using hashes here.

File details

Details for the file scrapy_rss-1.1.0-py310-none-any.whl.

File metadata

  • Download URL: scrapy_rss-1.1.0-py310-none-any.whl
  • Upload date:
  • Size: 30.8 kB
  • Tags: Python 3.10
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for scrapy_rss-1.1.0-py310-none-any.whl
Algorithm Hash digest
SHA256 0642828944ff1550b2d015900bdaa93a2fa3c6e2dd53fd485077b241a94487cc
MD5 012cf525dea227ee321b3619b8a6fec1
BLAKE2b-256 ba570246eaf725fdc58ed7b0ce231888a94632c296d6974c54ea427843c8b874

See more details on using hashes here.

File details

Details for the file scrapy_rss-1.1.0-py39-none-any.whl.

File metadata

  • Download URL: scrapy_rss-1.1.0-py39-none-any.whl
  • Upload date:
  • Size: 30.8 kB
  • Tags: Python 3.9
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for scrapy_rss-1.1.0-py39-none-any.whl
Algorithm Hash digest
SHA256 e4b634c714f168016e5feba13ae3aa90b9534e9c08ee6c92d314bfcb0477bb12
MD5 15b48532f519c53dd42547c127212a97
BLAKE2b-256 f648b845dc15f7f25bc2cc9256b2986c206d8bd03f4fe2814489ec88317493f6

See more details on using hashes here.

File details

Details for the file scrapy_rss-1.1.0-py38-none-any.whl.

File metadata

  • Download URL: scrapy_rss-1.1.0-py38-none-any.whl
  • Upload date:
  • Size: 30.8 kB
  • Tags: Python 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for scrapy_rss-1.1.0-py38-none-any.whl
Algorithm Hash digest
SHA256 8a46f0954972c3d1ccc3d411ad70d17c8aa8fec2f521b7dacd0e1d852110e904
MD5 66df75239a21a07c6d7f01b1b958d984
BLAKE2b-256 b1fdeb89e2ab0304e9e703d9041cc79adbe2ce167b3bd295db0ca8f178bec9a8

See more details on using hashes here.

File details

Details for the file scrapy_rss-1.1.0-py37-none-any.whl.

File metadata

  • Download URL: scrapy_rss-1.1.0-py37-none-any.whl
  • Upload date:
  • Size: 30.8 kB
  • Tags: Python 3.7
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for scrapy_rss-1.1.0-py37-none-any.whl
Algorithm Hash digest
SHA256 148167dc8770dbbe6e0e46cadc12a22dae977b6740c42661f0cd8ffe7c5c95ce
MD5 c490b4ce963d2eaa6cb94662bdfb0182
BLAKE2b-256 052b50ccb2f3372ad060c8cb0299a3ebca3781de9c70f9b50306053ad1903a14

See more details on using hashes here.

File details

Details for the file scrapy_rss-1.1.0-py36-none-any.whl.

File metadata

  • Download URL: scrapy_rss-1.1.0-py36-none-any.whl
  • Upload date:
  • Size: 30.8 kB
  • Tags: Python 3.6
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for scrapy_rss-1.1.0-py36-none-any.whl
Algorithm Hash digest
SHA256 9db20209a8df29bcedfbacff501b62e5aba0fe090fac7eed3de7a962932e73c5
MD5 119b95f49680a8bf212a78114670fbd9
BLAKE2b-256 8f1a8883412111d23ce74b25c21a17cd410b54f3c6d5612aa961eb0e38e389fb

See more details on using hashes here.

File details

Details for the file scrapy_rss-1.1.0-py35-none-any.whl.

File metadata

  • Download URL: scrapy_rss-1.1.0-py35-none-any.whl
  • Upload date:
  • Size: 30.8 kB
  • Tags: Python 3.5
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for scrapy_rss-1.1.0-py35-none-any.whl
Algorithm Hash digest
SHA256 93df81f54f940ffa2018dae55c7270e5eb9fa98eddf2473a6567ba9f19b8b654
MD5 d7e4b5ccdba9bf72762afbf5ea028a85
BLAKE2b-256 58e1b90f070a24c12e023b9a2777840025e69001367270da8cc052c6857fbc5a

See more details on using hashes here.

File details

Details for the file scrapy_rss-1.1.0-py34-none-any.whl.

File metadata

  • Download URL: scrapy_rss-1.1.0-py34-none-any.whl
  • Upload date:
  • Size: 30.8 kB
  • Tags: Python 3.4
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for scrapy_rss-1.1.0-py34-none-any.whl
Algorithm Hash digest
SHA256 efd65139722e08a7bb7b1f27f4ded658808c51522dbff8f76ff36ea3af806543
MD5 c4c1b666425046946a4d8b8042b0d207
BLAKE2b-256 2be69fd02ca9d1ffeca316948ac07ef044ffb4ad253f1bd4904e8fb7c2d25a21

See more details on using hashes here.

File details

Details for the file scrapy_rss-1.1.0-py33-none-any.whl.

File metadata

  • Download URL: scrapy_rss-1.1.0-py33-none-any.whl
  • Upload date:
  • Size: 35.6 kB
  • Tags: Python 3.3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for scrapy_rss-1.1.0-py33-none-any.whl
Algorithm Hash digest
SHA256 e367efe910cca18a8471453f30caabe41b76f1e48a5ce925b7c0c853fc773b16
MD5 322105ca4cfaa6e48bfbb5aa9cd07bad
BLAKE2b-256 3d12458cfdbc5368a369e66a8fa9f32e80c368c0f29bc2b1bc1b2c3bdf4b58d1

See more details on using hashes here.

File details

Details for the file scrapy_rss-1.1.0-py27-none-any.whl.

File metadata

  • Download URL: scrapy_rss-1.1.0-py27-none-any.whl
  • Upload date:
  • Size: 30.8 kB
  • Tags: Python 2.7
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for scrapy_rss-1.1.0-py27-none-any.whl
Algorithm Hash digest
SHA256 b42b69a77a77ff85f8c2b30ba2dd3e75c2929658b053357177bb5498bc2a97c5
MD5 19b5d2a13452ad88a2cb3bbaeb32b61a
BLAKE2b-256 e9ade43a97a92ed69f72979d39673c45013a254c1f5208aff985fbf3bd3d3f1a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page