Skip to main content

RSS Tools for Scrapy Framework

Project description

PyPI Version Wheel Status Testing status Coverage report Supported python versions

Tools to easy generate RSS feed that contains each scraped item using Scrapy framework.

Table of Contents

Installation

  • Install scrapy_rss using pip

    pip install scrapy_rss

    or using pip for the specific interpreter, e.g.:

    pip3 install scrapy_rss
  • or using setuptools directly:

    cd path/to/root/of/scrapy_rss
    python setup.py install

    or using setuptools for specific interpreter, e.g.:

    cd path/to/root/of/scrapy_rss
    python3 setup.py install

How To Use

Configuration

Add parameters to the Scrapy project settings (settings.py file) or to the custom_settings attribute of the spider:

  1. Add item pipeline that export items to rss feed:

    ITEM_PIPELINES = {
        # ...
        'scrapy_rss.pipelines.FeedExportPipeline': 900,  # or another priority
        # ...
    }
  2. Add required feed parameters:

    FEED_FILE

    the absolute or relative file path where the result RSS feed will be saved. For example, feed.rss or output/feed.rss.

    FEED_TITLE

    the name of the channel (feed),

    FEED_DESCRIPTION

    the phrase or sentence that describes the channel (feed),

    FEED_LINK

    the URL to the HTML website corresponding to the channel (feed)

    FEED_FILE = 'path/to/feed.rss'
    FEED_TITLE = 'Some title of the channel'
    FEED_LINK = 'http://example.com/rss'
    FEED_DESCRIPTION = 'About channel'

Usage

Basic usage

Declare your item directly as RssItem():

import scrapy_rss

item1 = scrapy_rss.RssItem()

Or use predefined item class RssedItem with RSS field named as rss that’s instance of RssItem:

import scrapy
import scrapy_rss

class MyItem(scrapy_rss.RssedItem):
    field1 = scrapy.Field()
    field2 = scrapy.Field()
    # ...

item2 = MyItem()

Set/get item fields. Case sensitive attributes of RssItem() are appropriate to RSS elements. Attributes of RSS elements are case sensitive too. If the editor allows autocompletion then it suggests attributes for instances of RssedItem and RssItem. It’s allowed to set any subset of RSS elements (e.g. title only). For example:

from datetime import datetime

item1.title = 'RSS item title'  # set value of <title> element
title = item1.title.value  # get value of <title> element
item1.description = 'description'

item1.guid = 'item identifier'
item1.guid.isPermaLink = False  # set value of attribute isPermalink of <guid> element,
                                # isPermaLink is True by default
is_permalink = item1.guid.isPermaLink  # get value of attribute isPermalink of <guid> element
guid = item1.guid.value  # get value of element <guid>

item1.category = 'single category'
category = item1.category
item1.category = ['first category', 'second category']
first_category = item1.category[0].value # get value of the element <category> with multiple values
all_categories = [cat.value for cat in item1.category]

# direct attributes setting
item1.enclosure.url = 'http://example.com/file'
item1.enclosure.length = 0
item1.enclosure.type = 'text/plain'

# or dict based attributes setting
item1.enclosure = {'url': 'http://example.com/file', 'length': 0, 'type': 'text/plain'}
item1.guid = {'value': 'item identifier', 'isPermaLink': True}

item1.pubDate = datetime.now()  # correctly works with Python' datetimes


item2.rss.title = 'Item title'
item2.rss.guid = 'identifier'
item2.rss.enclosure = {'url': 'http://example.com/file', 'length': 0, 'type': 'text/plain'}

All allowed elements are listed in the scrapy_rss/items.py. All allowed attributes of each element with constraints and default values are listed in the scrapy_rss/elements.py. Also you can read RSS specification for more details.

RssItem derivation and namespaces

You can extend RssItem to add new XML fields that can be namespaced or not. You can specify namespaces in an attribute and/or an element constructors. Namespace prefix can be specified in the attribute/element name using double underscores as delimiter (prefix__name) or in the attribute/element constructor using ns_prefix argument. Namespace URI can be specified using ns_uri argument of the constructor.

from scrapy_rss.meta import ElementAttribute, Element
from scrapy_rss.items import RssItem

class Element0(Element):
    # attributes without special namespace
    attr0 = ElementAttribute(is_content=True, required=True)
    attr1 = ElementAttribute()

class Element1(Element):
    # attribute "prefix2:attr2" with namespace xmlns:prefix2="id2"
    attr2 = ElementAttribute(ns_prefix="prefix2", ns_uri="id2")

    # attribute "prefix3:attr3" with namespace xmlns:prefix3="id3"
    prefix3__attr3 = ElementAttribute(ns_uri="id3")

    # attribute "prefix4:attr4" with namespace xmlns:prefix4="id4"
    fake_prefix__attr4 = ElementAttribute(ns_prefix="prefix4", ns_uri="id4")

    # attribute "attr5" with default namespace xmlns="id5"
    attr5 = ElementAttribute(ns_uri="id5")

class MyXMLItem(RssItem):
    # element <elem1> without namespace
    elem1 = Element0()

    # element <elem_prefix2:elem2> with namespace xmlns:elem_prefix2="id2e"
    elem2 = Element0(ns_prefix="elem_prefix2", ns_uri="id2e")

    # element <elem_prefix3:elem3> with namespace xmlns:elem_prefix3="id3e"
    elem_prefix3__elem3 = Element1(ns_uri="id3e")

    # yet another element <elem_prefix4:elem3> with namespace xmlns:elem_prefix4="id4e"
    # (does not conflict with previous one)
    fake_prefix__elem3 = Element0(ns_prefix="elem_prefix4", ns_uri="id4e")

    # element <elem5> with default namespace xmlns="id5e"
    elem5 = Element0(ns_uri="id5e")

Access to elements and its attributes is the same as with simple items:

item = MyXMLItem()
item.title = 'Some title'
item.elem1.attr0 = 'Required content value'
item.elem1 = 'Another way to set content value'
item.elem1.attr1 = 'Some attribute value'
item.elem_prefix3__elem3.prefix3__attr3 = 'Yet another attribute value'
item.elem_prefix3__elem3.fake_prefix__attr4 = '' # non-None value is interpreted as assigned
item.fake_prefix__elem3.attr1 = 42

Several optional settings are allowed for namespaced items:

FEED_NAMESPACES

list of tuples [(prefix, URI), ...] or dictionary {prefix: URI, ...} of namespaces that must be defined in the root XML element

FEED_ITEM_CLASS or FEED_ITEM_CLS

main class of feed items (class object MyXMLItem or path to class "path.to.MyXMLItem"). Default value: RssItem. It’s used in order to extract all possible namespaces that will be declared in the root XML element.

Feed items do NOT have to be instances of this class or its subclass.

If these settings are not defined or only part of namespaces are defined then other used namespaces will be declared either in the <item> element or in its subelements when these namespaces are not unique. Each <item> element and its sublements always contains only namespace declarations of non-None attributes (including ones that are interpreted as element content).

Feed (Channel) Elements Customization [optionally]

If you want to change other channel parameters (such as language, copyright, managingEditor, webMaster, pubDate, lastBuildDate, category, generator, docs, cloud, ttl, image, rating, textInput, skipHours, skipDays) then define your own exporter that’s inherited from FeedItemExporter class and, for example, modify one or more children of self.channel Element (camelCase attributes naming):

from datetime import datetime
from scrapy_rss.rss import channel_elements
from scrapy_rss.exporters import FeedItemExporter

class MyRssItemExporter(FeedItemExporter):
   def __init__(self, *args, **kwargs):
      super(MyRssItemExporter, self).__init__(*args, **kwargs)
      self.channel.generator = 'Special generator'
      self.channel.language = 'en-us'
      self.channel.managingEditor = 'editor@example.com'
      self.channel.webMaster = 'webmaster@example.com'
      self.channel.copyright = 'Copyright 2025'
      self.channel.pubDate = datetime(2025, 9, 10, 13, 0, 0)

      self.channel.category = ['category 1', 'category 2']
      self.channel.category.append('category 3')
      self.channel.category.extend(['category 4', 'category 5'])

      # initialize image from dict
      self.channel.image = {
          'url': 'https://example.com/img.jpg',
          'description': 'Image link hover text',
      }
      # or initialize image from ImageElement
      self.channel.image = channel_elements.ImageElement(url='https://example.com/img.jpg')
      # or initialize image by each attribute
      self.channel.image.url = 'https://example.com/img.jpg' # required attribute of image
      self.channel.image.title = 'Image title' # optional
      self.channel.image.link = 'https://example.com/page' # optional
      self.channel.image.description = 'Image link hover text' # optional
      self.channel.image.width = 140 # optional
      self.channel.image.height = 350 # optional

      self.channel.docs = 'https://example.com/rss_docs'
      self.channel.cloud = {
          'domain': 'rpc.sys.com',
          'port': '80',
          'path': '/RPC2',
          'registerProcedure': 'myCloud.rssPleaseNotify',
          'protocol': 'xml-rpc'
      }
      self.channel.ttl = 60
      self.channel.rating = 4.0
      self.channel.textInput = channel_elements.TextInputElement(
          title='Input title',
          description='Description of input',
          name='Input name',
          link='http://example.com/cgi.py'
      )

      self.channel.skipHours = (0, 1, 3, 7, 23) # initialize list from iterable
      self.channel.skipHours = 12 # or initialize list with single value

      self.channel.skipDays = 14 # initialize list with single value
      self.channel.skipDays = [1, 14] # or initialize list from list

or modify kwargs arguments (snake_case arguments naming):

from scrapy_rss.exporters import FeedItemExporter

class MyRssItemExporter(FeedItemExporter):
   def __init__(self, *args, **kwargs):
      kwargs['generator'] = kwargs.get('generator', 'Special generator')
      kwargs['language'] = kwargs.get('language', 'en-us')
      kwargs['managing_editor'] = kwargs.get('managing_editor', 'editor@example.com')
      kwargs['managing_editor'] = kwargs.get('managing_editor', ('category 1', 'category 2'))
      kwargs['image'] = kwargs.get('image', {'url': 'https://example.com/img.jpg'})
      # etc.
      super(MyRssItemExporter, self).__init__(*args, **kwargs)

And add FEED_EXPORTER parameter to the Scrapy project settings or to the custom_settings attribute of the spider:

FEED_EXPORTER = 'myproject.exporters.MyRssItemExporter'

Backward compatibility notices

Since version 1.0.0 some classes have been renamed, but old-named classes have been kept and marked as deprecated for bacward compatibility, so they can still be used.

But some elements of RssItem have some their attributes renamed in a backward incompatible way: almost all content attributes (text content of XML tag after exporting) are renamed to value to enhance code readability.

So if you do not want update your code expressions (such as an old-style item.title.title to a new-style item.title.value or item.guid.guid to item.guid.value) then you can easily import old-style classes

# old-style classes
from scrapy_rss.rss.old.items import RssItem, RssedItem

instead of new-style ones

# new-style classes
from scrapy_rss.items import RssItem, RssedItem

respectively.

Scrapy Project Examples

Examples directory contains several Scrapy projects with the scrapy_rss usage demonstration. It crawls this website whose source code is here.

Just go to the Scrapy project directory and run commands

scrapy crawl first_spider
scrapy crawl second_spider

Thereafter feed.rss and feed2.rss files will be created in the same directory.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy_rss-1.0.1.tar.gz (291.2 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

scrapy_rss-1.0.1-py313-none-any.whl (30.6 kB view details)

Uploaded Python 3.13

scrapy_rss-1.0.1-py312-none-any.whl (30.6 kB view details)

Uploaded Python 3.12

scrapy_rss-1.0.1-py311-none-any.whl (30.6 kB view details)

Uploaded Python 3.11

scrapy_rss-1.0.1-py310-none-any.whl (30.6 kB view details)

Uploaded Python 3.10

scrapy_rss-1.0.1-py39-none-any.whl (30.6 kB view details)

Uploaded Python 3.9

scrapy_rss-1.0.1-py38-none-any.whl (30.5 kB view details)

Uploaded Python 3.8

scrapy_rss-1.0.1-py37-none-any.whl (30.5 kB view details)

Uploaded Python 3.7

scrapy_rss-1.0.1-py36-none-any.whl (30.5 kB view details)

Uploaded Python 3.6

scrapy_rss-1.0.1-py35-none-any.whl (30.5 kB view details)

Uploaded Python 3.5

scrapy_rss-1.0.1-py34-none-any.whl (30.5 kB view details)

Uploaded Python 3.4

scrapy_rss-1.0.1-py33-none-any.whl (35.3 kB view details)

Uploaded Python 3.3

scrapy_rss-1.0.1-py27-none-any.whl (30.5 kB view details)

Uploaded Python 2.7

File details

Details for the file scrapy_rss-1.0.1.tar.gz.

File metadata

  • Download URL: scrapy_rss-1.0.1.tar.gz
  • Upload date:
  • Size: 291.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for scrapy_rss-1.0.1.tar.gz
Algorithm Hash digest
SHA256 c1e7ab37e6647e0f730d19e8867d7c0a59ccdf44f13b1a5ff7217df14be21c65
MD5 5cf9d6d851473cc8cf6a3bad6980bb56
BLAKE2b-256 329c60a2bcb628a29fdb1c3ed00a228ff9d9aeb7cdd7c2165dc97279a3af84cc

See more details on using hashes here.

File details

Details for the file scrapy_rss-1.0.1-py313-none-any.whl.

File metadata

  • Download URL: scrapy_rss-1.0.1-py313-none-any.whl
  • Upload date:
  • Size: 30.6 kB
  • Tags: Python 3.13
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for scrapy_rss-1.0.1-py313-none-any.whl
Algorithm Hash digest
SHA256 1705923d0bd609957d2134a6530167809ce8c474089b089ae045b91e1f73c3b2
MD5 830ee72481a9058660dca446f6a95ced
BLAKE2b-256 d89db04f86ea76c8e62a687d8272437e72d569dd1e70371ea98a7283d3f1fe89

See more details on using hashes here.

File details

Details for the file scrapy_rss-1.0.1-py312-none-any.whl.

File metadata

  • Download URL: scrapy_rss-1.0.1-py312-none-any.whl
  • Upload date:
  • Size: 30.6 kB
  • Tags: Python 3.12
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for scrapy_rss-1.0.1-py312-none-any.whl
Algorithm Hash digest
SHA256 d1c34c1a69c46c625ecaa5d8cfcda78a4e4fcbd56f64795b494bee3093605a13
MD5 1e4cb039ff058a528f28c5047a67e500
BLAKE2b-256 9808c3fb20b613517ba90e3f1b20aabd529efd6c4b9d02ea9cbaa7e2d82e525e

See more details on using hashes here.

File details

Details for the file scrapy_rss-1.0.1-py311-none-any.whl.

File metadata

  • Download URL: scrapy_rss-1.0.1-py311-none-any.whl
  • Upload date:
  • Size: 30.6 kB
  • Tags: Python 3.11
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for scrapy_rss-1.0.1-py311-none-any.whl
Algorithm Hash digest
SHA256 159a3a89b03e3293795a3f374e77a7bf476764c85d6b2c878cf107755e4c774a
MD5 b280360df13b45d803850315a7115d66
BLAKE2b-256 6bc09e035c0f3274405a1d2e393d0fb0d1affdf60c00fd662b651f55720f8caa

See more details on using hashes here.

File details

Details for the file scrapy_rss-1.0.1-py310-none-any.whl.

File metadata

  • Download URL: scrapy_rss-1.0.1-py310-none-any.whl
  • Upload date:
  • Size: 30.6 kB
  • Tags: Python 3.10
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for scrapy_rss-1.0.1-py310-none-any.whl
Algorithm Hash digest
SHA256 58cff299ad40cec5fe13c19fae7ecf9e499d74e309ca88e9f218ecdd8e1fc8de
MD5 c47977841966e0a4658779c47d153954
BLAKE2b-256 988a872bc95330daf8c571b8f73485d37a6a47db7f77b6cf26437879b83c2809

See more details on using hashes here.

File details

Details for the file scrapy_rss-1.0.1-py39-none-any.whl.

File metadata

  • Download URL: scrapy_rss-1.0.1-py39-none-any.whl
  • Upload date:
  • Size: 30.6 kB
  • Tags: Python 3.9
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for scrapy_rss-1.0.1-py39-none-any.whl
Algorithm Hash digest
SHA256 592ee46ea35bde96c20733622fbd3588e662d498dcfe9cf4d344f42973b11783
MD5 f2d5ca49945f86f7732f9cb8ea486e3a
BLAKE2b-256 278ae1a390912d1be2596efd2ec0b0b00fd92e4d931e7e76c14bd3637cd3a58d

See more details on using hashes here.

File details

Details for the file scrapy_rss-1.0.1-py38-none-any.whl.

File metadata

  • Download URL: scrapy_rss-1.0.1-py38-none-any.whl
  • Upload date:
  • Size: 30.5 kB
  • Tags: Python 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for scrapy_rss-1.0.1-py38-none-any.whl
Algorithm Hash digest
SHA256 57ac527483e7a691c5c41b2e0f88c09a0e76f7233532a427008d1bc183986afd
MD5 0c7419067d1d1c88de5aa82b6b3e870c
BLAKE2b-256 dee0de2c8e090e78ac4117efe9a8e91998184f047863647783a1702decfe1a12

See more details on using hashes here.

File details

Details for the file scrapy_rss-1.0.1-py37-none-any.whl.

File metadata

  • Download URL: scrapy_rss-1.0.1-py37-none-any.whl
  • Upload date:
  • Size: 30.5 kB
  • Tags: Python 3.7
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for scrapy_rss-1.0.1-py37-none-any.whl
Algorithm Hash digest
SHA256 fa5d08b4a9b440a969184b2bfa0db08f72e06e18389abf52cf302518033a1077
MD5 10b7631441ee66b0a4f9ff74de5976a8
BLAKE2b-256 12a579249a5b2b0d1886e29d14a0b9e6b9e965f908673ae867912c591faad764

See more details on using hashes here.

File details

Details for the file scrapy_rss-1.0.1-py36-none-any.whl.

File metadata

  • Download URL: scrapy_rss-1.0.1-py36-none-any.whl
  • Upload date:
  • Size: 30.5 kB
  • Tags: Python 3.6
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for scrapy_rss-1.0.1-py36-none-any.whl
Algorithm Hash digest
SHA256 b262152250f113003d2213a8bbee102307695b6ab7c3f24026fd0dd96ca2a803
MD5 9d8e145795bb9a3728b43ba435a161c9
BLAKE2b-256 3d42039cfde2a072fd1abbe906b2b91be706f71495a1c3b2d3334efa6248eb86

See more details on using hashes here.

File details

Details for the file scrapy_rss-1.0.1-py35-none-any.whl.

File metadata

  • Download URL: scrapy_rss-1.0.1-py35-none-any.whl
  • Upload date:
  • Size: 30.5 kB
  • Tags: Python 3.5
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for scrapy_rss-1.0.1-py35-none-any.whl
Algorithm Hash digest
SHA256 aff2a873b6e29bc27ff194751e45154b32798b1f873eb6ee102f58b791567fa7
MD5 6ff2a3df828560ba262f9871b6848028
BLAKE2b-256 ac01c93a6031152b4430432e563fd9822f85b56c8f61c2bad685f08d8b92d8fb

See more details on using hashes here.

File details

Details for the file scrapy_rss-1.0.1-py34-none-any.whl.

File metadata

  • Download URL: scrapy_rss-1.0.1-py34-none-any.whl
  • Upload date:
  • Size: 30.5 kB
  • Tags: Python 3.4
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for scrapy_rss-1.0.1-py34-none-any.whl
Algorithm Hash digest
SHA256 b9fe98eccadc26102cc687a7291ce55c31ffc0025306b4cc1e208c100dfe82b4
MD5 63d4d9b2c516d441759b6e3f48ee3c64
BLAKE2b-256 5a2385b4ca2eb2dca3b8817f26d14fc6e6d24647e5baaac51b1faf471c6413d4

See more details on using hashes here.

File details

Details for the file scrapy_rss-1.0.1-py33-none-any.whl.

File metadata

  • Download URL: scrapy_rss-1.0.1-py33-none-any.whl
  • Upload date:
  • Size: 35.3 kB
  • Tags: Python 3.3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for scrapy_rss-1.0.1-py33-none-any.whl
Algorithm Hash digest
SHA256 099ee374fc5efda514100c0e408111696f94b9c3b47f748f87a83787dc9e761f
MD5 207c485c2d735767687fc1ed1d1229fa
BLAKE2b-256 d6d95b9e4f409f75deebf7e838179a1bf84dcae2eba4191eda78a050b58d0043

See more details on using hashes here.

File details

Details for the file scrapy_rss-1.0.1-py27-none-any.whl.

File metadata

  • Download URL: scrapy_rss-1.0.1-py27-none-any.whl
  • Upload date:
  • Size: 30.5 kB
  • Tags: Python 2.7
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for scrapy_rss-1.0.1-py27-none-any.whl
Algorithm Hash digest
SHA256 da30ce69979758624cdfb07afa8336b5458d27c17cb365e31024c4e1dcac374d
MD5 55dd0be413aa0ed6f777482bc4a8d5f2
BLAKE2b-256 bf8cb763147a9d770bb641d2f7505d001940a81a783425f906bac41897573ab7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page