Skip to main content

RSS Tools for Scrapy Framework

Project description

PyPI Version Wheel Status Testing status Coverage report Supported python versions

Tools to easy generate RSS feed that contains each scraped item using Scrapy framework.

Table of Contents

Installation

  • Install scrapy_rss using pip

    pip install scrapy_rss

    or using pip for the specific interpreter, e.g.:

    pip3 install scrapy_rss
  • or using setuptools directly:

    cd path/to/root/of/scrapy_rss
    python setup.py install

    or using setuptools for specific interpreter, e.g.:

    cd path/to/root/of/scrapy_rss
    python3 setup.py install

How To Use

Configuration

Add parameters to the Scrapy project settings (settings.py file) or to the custom_settings attribute of the spider:

  1. Add item pipeline that export items to rss feed:

    ITEM_PIPELINES = {
        # ...
        'scrapy_rss.pipelines.FeedExportPipeline': 900,  # or another priority
        # ...
    }
  2. Add required feed parameters:

    FEED_FILE

    the absolute or relative file path where the result RSS feed will be saved. For example, feed.rss or output/feed.rss.

    FEED_TITLE

    the name of the channel (feed),

    FEED_DESCRIPTION

    the phrase or sentence that describes the channel (feed),

    FEED_LINK

    the URL to the HTML website corresponding to the channel (feed)

    FEED_FILE = 'path/to/feed.rss'
    FEED_TITLE = 'Some title of the channel'
    FEED_LINK = 'http://example.com/rss'
    FEED_DESCRIPTION = 'About channel'

Usage

Basic usage

Declare your item directly as RssItem():

import scrapy_rss

item1 = scrapy_rss.RssItem()

Or use predefined item class RssedItem with RSS field named as rss that’s instance of RssItem:

import scrapy
import scrapy_rss

class MyItem(scrapy_rss.RssedItem):
    field1 = scrapy.Field()
    field2 = scrapy.Field()
    # ...

item2 = MyItem()

Set/get item fields. Case sensitive attributes of RssItem() are appropriate to RSS elements. Attributes of RSS elements are case sensitive too. If the editor allows autocompletion then it suggests attributes for instances of RssedItem and RssItem. It’s allowed to set any subset of RSS elements (e.g. title only). For example:

from datetime import datetime

item1.title = 'RSS item title'  # set value of <title> element
title = item1.title.value  # get value of <title> element
item1.description = 'description'

item1.guid = 'item identifier'
item1.guid.isPermaLink = True  # set value of attribute isPermalink of <guid> element,
                               # isPermaLink is False by default
is_permalink = item1.guid.isPermaLink  # get value of attribute isPermalink of <guid> element
guid = item1.guid.value  # get value of element <guid>

item1.category = 'single category'
category = item1.category
item1.category = ['first category', 'second category']
first_category = item1.category[0].value # get value of the element <category> with multiple values
all_categories = [cat.value for cat in item1.category]

# direct attributes setting
item1.enclosure.url = 'http://example.com/file'
item1.enclosure.length = 0
item1.enclosure.type = 'text/plain'

# or dict based attributes setting
item1.enclosure = {'url': 'http://example.com/file', 'length': 0, 'type': 'text/plain'}
item1.guid = {'value': 'item identifier', 'isPermaLink': True}

item1.pubDate = datetime.now()  # correctly works with Python' datetimes


item2.rss.title = 'Item title'
item2.rss.guid = 'identifier'
item2.rss.enclosure = {'url': 'http://example.com/file', 'length': 0, 'type': 'text/plain'}

All allowed elements are listed in the scrapy_rss/items.py. All allowed attributes of each element with constraints and default values are listed in the scrapy_rss/elements.py. Also you can read RSS specification for more details.

RssItem derivation and namespaces

You can extend RssItem to add new XML fields that can be namespaced or not. You can specify namespaces in an attribute and/or an element constructors. Namespace prefix can be specified in the attribute/element name using double underscores as delimiter (prefix__name) or in the attribute/element constructor using ns_prefix argument. Namespace URI can be specified using ns_uri argument of the constructor.

from scrapy_rss.meta import ItemElementAttribute, ItemElement
from scrapy_rss.items import RssItem

class Element0(ItemElement):
    # attributes without special namespace
    attr0 = ItemElementAttribute(is_content=True, required=True)
    attr1 = ItemElementAttribute()

class Element1(ItemElement):
    # attribute "prefix2:attr2" with namespace xmlns:prefix2="id2"
    attr2 = ItemElementAttribute(ns_prefix="prefix2", ns_uri="id2")

    # attribute "prefix3:attr3" with namespace xmlns:prefix3="id3"
    prefix3__attr3 = ItemElementAttribute(ns_uri="id3")

    # attribute "prefix4:attr4" with namespace xmlns:prefix4="id4"
    fake_prefix__attr4 = ItemElementAttribute(ns_prefix="prefix4", ns_uri="id4")

    # attribute "attr5" with default namespace xmlns="id5"
    attr5 = ItemElementAttribute(ns_uri="id5")

class MyXMLItem(RssItem):
    # element <elem1> without namespace
    elem1 = Element0()

    # element <elem_prefix2:elem2> with namespace xmlns:elem_prefix2="id2e"
    elem2 = Element0(ns_prefix="elem_prefix2", ns_uri="id2e")

    # element <elem_prefix3:elem3> with namespace xmlns:elem_prefix3="id3e"
    elem_prefix3__elem3 = Element1(ns_uri="id3e")

    # yet another element <elem_prefix4:elem3> with namespace xmlns:elem_prefix4="id4e"
    # (does not conflict with previous one)
    fake_prefix__elem3 = Element0(ns_prefix="elem_prefix4", ns_uri="id4e")

    # element <elem5> with default namespace xmlns="id5e"
    elem5 = Element0(ns_uri="id5e")

Access to elements and its attributes is the same as with simple items:

item = MyXMLItem()
item.title = 'Some title'
item.elem1.attr0 = 'Required content value'
item.elem1 = 'Another way to set content value'
item.elem1.attr1 = 'Some attribute value'
item.elem_prefix3__elem3.prefix3__attr3 = 'Yet another attribute value'
item.elem_prefix3__elem3.fake_prefix__attr4 = '' # non-None value is interpreted as assigned
item.fake_prefix__elem3.attr1 = 42

Several optional settings are allowed for namespaced items:

FEED_NAMESPACES

list of tuples [(prefix, URI), ...] or dictionary {prefix: URI, ...} of namespaces that must be defined in the root XML element

FEED_ITEM_CLASS or FEED_ITEM_CLS

main class of feed items (class object MyXMLItem or path to class "path.to.MyXMLItem"). Default value: RssItem. It’s used in order to extract all possible namespaces that will be declared in the root XML element.

Feed items do NOT have to be instances of this class or its subclass.

If these settings are not defined or only part of namespaces are defined then other used namespaces will be declared either in the <item> element or in its subelements when these namespaces are not unique. Each <item> element and its sublements always contains only namespace declarations of non-None attributes (including ones that are interpreted as element content).

Feed (Channel) Elements Customization [optionally]

If you want to change other channel parameters (such as language, copyright, managingEditor, webMaster, pubDate, lastBuildDate, category, generator, docs, cloud, ttl, image, rating, textInput, skipHours, skipDays) then define your own exporter that’s inherited from FeedItemExporter class and, for example, modify one or more children of self.channel Element (camelCase attributes naming):

from datetime import datetime
from scrapy_rss.rss import channel_elements
from scrapy_rss.exporters import FeedItemExporter

class MyRssItemExporter(FeedItemExporter):
   def __init__(self, *args, **kwargs):
      super(MyRssItemExporter, self).__init__(*args, **kwargs)
      self.channel.generator = 'Special generator'
      self.channel.language = 'en-us'
      self.channel.managingEditor = 'editor@example.com'
      self.channel.webMaster = 'webmaster@example.com'
      self.channel.copyright = 'Copyright 2025'
      self.channel.pubDate = datetime(2025, 9, 10, 13, 0, 0)

      self.channel.category = ['category 1', 'category 2']
      self.channel.category.append('category 3')
      self.channel.category.extend(['category 4', 'category 5'])

      # initialize image from dict
      self.channel.image = {
          'url': 'https://example.com/img.jpg',
          'description': 'Image link hover text',
      }
      # or initialize image from ImageElement
      self.channel.image = channel_elements.ImageElement(url='https://example.com/img.jpg')
      # or initialize image by each attribute
      self.channel.image.url = 'https://example.com/img.jpg' # required attribute of image
      self.channel.image.title = 'Image title' # optional
      self.channel.image.link = 'https://example.com/page' # optional
      self.channel.image.description = 'Image link hover text' # optional
      self.channel.image.width = 140 # optional
      self.channel.image.height = 350 # optional

      self.channel.docs = 'https://example.com/rss_docs'
      self.channel.cloud = {
          'domain': 'rpc.sys.com',
          'port': '80',
          'path': '/RPC2',
          'registerProcedure': 'myCloud.rssPleaseNotify',
          'protocol': 'xml-rpc'
      }
      self.channel.ttl = 60
      self.channel.rating = 4.0
      self.channel.textInput = channel_elements.TextInputElement(
          title='Input title',
          description='Description of input',
          name='Input name',
          link='http://example.com/cgi.py'
      )

      self.channel.skipHours = (0, 1, 3, 7, 23) # initialize list from iterable
      self.channel.skipHours = 12 # or initialize list with single value

      self.channel.skipDays = 14 # initialize list with single value
      self.channel.skipDays = [1, 14] # or initialize list from list

or modify kwargs arguments (snake_case arguments naming):

from scrapy_rss.exporters import FeedItemExporter

class MyRssItemExporter(FeedItemExporter):
   def __init__(self, *args, **kwargs):
      kwargs['generator'] = kwargs.get('generator', 'Special generator')
      kwargs['language'] = kwargs.get('language', 'en-us')
      kwargs['managing_editor'] = kwargs.get('managing_editor', 'editor@example.com')
      kwargs['managing_editor'] = kwargs.get('managing_editor', ('category 1', 'category 2'))
      kwargs['image'] = kwargs.get('image', {'url': 'https://example.com/img.jpg'})
      # etc.
      super(MyRssItemExporter, self).__init__(*args, **kwargs)

And add FEED_EXPORTER parameter to the Scrapy project settings or to the custom_settings attribute of the spider:

FEED_EXPORTER = 'myproject.exporters.MyRssItemExporter'

Backward compatibility notices

Since version 1.0.0 some classes have been renamed, but old-named classes have been kept and marked as deprecated for bacward compatibility, so they can still be used.

But some elements of RssItem have some their attributes renamed in a backward incompatible way: almost all content attributes (text content of XML tag after exporting) are renamed to value to enhance code readability.

So if you do not want update your code expressions (such as an old-style item.title.title to a new-style item.title.value or item.guid.guid to item.guid.value) then you can easily import old-style classes

# old-style classes
from scrapy_rss.rss.old.items import RssItem, RssedItem

instead of new-style ones

# new-style classes
from scrapy_rss.items import RssItem, RssedItem

respectively.

Scrapy Project Examples

Examples directory contains several Scrapy projects with the scrapy_rss usage demonstration. It crawls this website whose source code is here.

Just go to the Scrapy project directory and run commands

scrapy crawl first_spider
scrapy crawl second_spider

Thereafter feed.rss and feed2.rss files will be created in the same directory.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

scrapy_rss-1.0.0-py313-none-any.whl (30.6 kB view details)

Uploaded Python 3.13

scrapy_rss-1.0.0-py312-none-any.whl (30.6 kB view details)

Uploaded Python 3.12

scrapy_rss-1.0.0-py311-none-any.whl (30.6 kB view details)

Uploaded Python 3.11

scrapy_rss-1.0.0-py310-none-any.whl (30.6 kB view details)

Uploaded Python 3.10

scrapy_rss-1.0.0-py39-none-any.whl (30.6 kB view details)

Uploaded Python 3.9

scrapy_rss-1.0.0-py38-none-any.whl (30.5 kB view details)

Uploaded Python 3.8

scrapy_rss-1.0.0-py37-none-any.whl (30.5 kB view details)

Uploaded Python 3.7

scrapy_rss-1.0.0-py36-none-any.whl (30.5 kB view details)

Uploaded Python 3.6

scrapy_rss-1.0.0-py34-none-any.whl (30.5 kB view details)

Uploaded Python 3.4

scrapy_rss-1.0.0-py33-none-any.whl (35.3 kB view details)

Uploaded Python 3.3

scrapy_rss-1.0.0-py27-none-any.whl (30.5 kB view details)

Uploaded Python 2.7

scrapy_rss-1.0.0-0-py35-none-any.whl (30.5 kB view details)

Uploaded Python 3.5

File details

Details for the file scrapy_rss-1.0.0-py313-none-any.whl.

File metadata

  • Download URL: scrapy_rss-1.0.0-py313-none-any.whl
  • Upload date:
  • Size: 30.6 kB
  • Tags: Python 3.13
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for scrapy_rss-1.0.0-py313-none-any.whl
Algorithm Hash digest
SHA256 f444a86b2b934334a46cab354d13dceae7adadf98528f19b8a339fe5eb7ab0df
MD5 5f78380276c625d41b1ee73676d73956
BLAKE2b-256 3fa2f6ab9e45a439c392aa6246a41ece12ed4d8b15d645122e9b19fe3f3cee8b

See more details on using hashes here.

File details

Details for the file scrapy_rss-1.0.0-py312-none-any.whl.

File metadata

  • Download URL: scrapy_rss-1.0.0-py312-none-any.whl
  • Upload date:
  • Size: 30.6 kB
  • Tags: Python 3.12
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for scrapy_rss-1.0.0-py312-none-any.whl
Algorithm Hash digest
SHA256 ab04929681e7cc41353fb6d5821f04e98c6fa1e56734d463f256e90e40f5b65f
MD5 2945dcf12361f054efae7b29f73013e6
BLAKE2b-256 e1f43eb98a834cb0dc9d7c15bafa37e6d7af152e9a2e00985afdbfbad220a54e

See more details on using hashes here.

File details

Details for the file scrapy_rss-1.0.0-py311-none-any.whl.

File metadata

  • Download URL: scrapy_rss-1.0.0-py311-none-any.whl
  • Upload date:
  • Size: 30.6 kB
  • Tags: Python 3.11
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for scrapy_rss-1.0.0-py311-none-any.whl
Algorithm Hash digest
SHA256 059968ab90ed205b2fa03a020c4261b268726cb1f54b1072bdb419b03ecf37e9
MD5 4d0f26fc3bafe8db7f328e699b437030
BLAKE2b-256 f8ad82018d7c56da802611b901c74cfa8a9d529b11deaeba4ca5e3795c250029

See more details on using hashes here.

File details

Details for the file scrapy_rss-1.0.0-py310-none-any.whl.

File metadata

  • Download URL: scrapy_rss-1.0.0-py310-none-any.whl
  • Upload date:
  • Size: 30.6 kB
  • Tags: Python 3.10
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for scrapy_rss-1.0.0-py310-none-any.whl
Algorithm Hash digest
SHA256 e8d4d89fdb64d4d10328aaded8ab967ab2f4b790e5bdcf625a1ffbcf40b66695
MD5 3e62d240b339f5ff75228078b3b5ed8a
BLAKE2b-256 c6369e162b9908ae4d7534fb2cc80c385c8fd5c5a5fbb2797d829e7c1b00db0d

See more details on using hashes here.

File details

Details for the file scrapy_rss-1.0.0-py39-none-any.whl.

File metadata

  • Download URL: scrapy_rss-1.0.0-py39-none-any.whl
  • Upload date:
  • Size: 30.6 kB
  • Tags: Python 3.9
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for scrapy_rss-1.0.0-py39-none-any.whl
Algorithm Hash digest
SHA256 2645989674d42cb4a0ceaa312addadbb84c1c10269370c00c4a508beda781faf
MD5 29281ce585f0b618fc9dd75aeb6027fa
BLAKE2b-256 1b22085a20e3e22a78e226ef324d74bf217943c64be70d53290a736cf39b6db6

See more details on using hashes here.

File details

Details for the file scrapy_rss-1.0.0-py38-none-any.whl.

File metadata

  • Download URL: scrapy_rss-1.0.0-py38-none-any.whl
  • Upload date:
  • Size: 30.5 kB
  • Tags: Python 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for scrapy_rss-1.0.0-py38-none-any.whl
Algorithm Hash digest
SHA256 8b090ba3b25f98576d11e89689a772a10d87dca67b3e9da198c66f80732b5170
MD5 325bd105a58d0bf059c45d9034723112
BLAKE2b-256 99693547c749f362953eb92253576e7a7931981d2e85c4cb04317bf69079e1c9

See more details on using hashes here.

File details

Details for the file scrapy_rss-1.0.0-py37-none-any.whl.

File metadata

  • Download URL: scrapy_rss-1.0.0-py37-none-any.whl
  • Upload date:
  • Size: 30.5 kB
  • Tags: Python 3.7
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for scrapy_rss-1.0.0-py37-none-any.whl
Algorithm Hash digest
SHA256 a312548ebf86428c99f6b826f2aa92749e6a62ca5f7064d19ff38e555206202e
MD5 6e1df5f4884b6a1fb373688b77252691
BLAKE2b-256 ec5034c7299cb98946fa447fcbb0275b4e917c2ef6ffaa70212cf1e1ba56fbce

See more details on using hashes here.

File details

Details for the file scrapy_rss-1.0.0-py36-none-any.whl.

File metadata

  • Download URL: scrapy_rss-1.0.0-py36-none-any.whl
  • Upload date:
  • Size: 30.5 kB
  • Tags: Python 3.6
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for scrapy_rss-1.0.0-py36-none-any.whl
Algorithm Hash digest
SHA256 59280e4dfa91d7d1bdcfcfe45049519cf6551508e26197d16992a5d9ff971e92
MD5 d47d5f81171b3d71c210117cdc972c7a
BLAKE2b-256 69c93bc49a9bf4faf15d3c19d88d3a87934672d536de7b4549646fda2b568905

See more details on using hashes here.

File details

Details for the file scrapy_rss-1.0.0-py34-none-any.whl.

File metadata

  • Download URL: scrapy_rss-1.0.0-py34-none-any.whl
  • Upload date:
  • Size: 30.5 kB
  • Tags: Python 3.4
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for scrapy_rss-1.0.0-py34-none-any.whl
Algorithm Hash digest
SHA256 cd1cfcc600dfc9733f9142ba90b327d7a7ac05ddac5579ae63f2b2b1520c1062
MD5 2dc3b1bef60237453ba600117eed11d8
BLAKE2b-256 aabc1c8b1eaac43f0433b742669311a63c4e86478fb80abaee677c7ee5680147

See more details on using hashes here.

File details

Details for the file scrapy_rss-1.0.0-py33-none-any.whl.

File metadata

  • Download URL: scrapy_rss-1.0.0-py33-none-any.whl
  • Upload date:
  • Size: 35.3 kB
  • Tags: Python 3.3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for scrapy_rss-1.0.0-py33-none-any.whl
Algorithm Hash digest
SHA256 0cff807c1371fbcc612e648dfa07975ccae131356ab08d4076df644a4371e209
MD5 edc6559c48f3cbe5a6cd0fde4d31da90
BLAKE2b-256 7473f3883abd9a280c5027cce196c94ba4e2672f05c9e44b8c1882475d0aa7c8

See more details on using hashes here.

File details

Details for the file scrapy_rss-1.0.0-py27-none-any.whl.

File metadata

  • Download URL: scrapy_rss-1.0.0-py27-none-any.whl
  • Upload date:
  • Size: 30.5 kB
  • Tags: Python 2.7
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for scrapy_rss-1.0.0-py27-none-any.whl
Algorithm Hash digest
SHA256 d419a3bd1fe632b9c1690676948305e578fe1f68c278499606b18af46a5cb524
MD5 ee98cbb5e2c166d7fe128b3896d22e82
BLAKE2b-256 af57335ea5421a7ce48b7328751ce07786edcf22deb673daddbb5d0f9627dff1

See more details on using hashes here.

File details

Details for the file scrapy_rss-1.0.0-0-py35-none-any.whl.

File metadata

  • Download URL: scrapy_rss-1.0.0-0-py35-none-any.whl
  • Upload date:
  • Size: 30.5 kB
  • Tags: Python 3.5
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for scrapy_rss-1.0.0-0-py35-none-any.whl
Algorithm Hash digest
SHA256 88dd35addfcd7995cec6309d7580c921e85838e062e7720264dc2c040df16246
MD5 4d168439e5f214a28648176e509150ef
BLAKE2b-256 bceb6c4925f6628d254597318ac646151971ed021bb5e6bd9db41fa3773569fc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page