Skip to main content

RSS Tools for Scrapy Framework

Project description

PyPI Version Build Status Wheel Status Coverage report

Tools for easy RSS feed generating that contains each scraped item using Scrapy framework.

Package works with Python 2.7, 3.3, 3.4, 3.5 and 3.6.

Installation

  • Install scrapy_rss using pip

    pip install scrapy_rss

    or using pip for specific interpreter, e.g.:

    pip3 install scrapy_rss
  • or using directly setuptools:

    cd path/to/root/of/scrapy_rss
    python setup.py install

    or using setuptools for specific interpreter, e.g.:

    cd path/to/root/of/scrapy_rss
    python3 setup.py install

How To Use

Add parameters to the Scrapy project settings (settings.py file) or to the custom_settings attribute of the spider:

  1. Add item pipeline that export items to rss feed:

    ITEM_PIPELINES = {
        # ...
        'scrapy_rss.pipelines.RssExportPipeline': 900,  # or another priority
        # ...
    }
  2. Add required feed parameters:

    FEED_FILE

    absolute or relative file path where the result RSS feed will be saved. For example, feed.rss or output/feed.rss.

    FEED_TITLE

    the name of the channel (feed),

    FEED_DESCRIPTION

    phrase or sentence describing the channel (feed),

    FEED_LINK

    the URL to the HTML website corresponding to the channel (feed)

    FEED_FILE = 'path/to/feed.rss'
    FEED_TITLE = 'Some title of the channel'
    FEED_LINK = 'http://example.com/rss'
    FEED_DESCRIPTION = 'About channel'

Declare your item directly as RssItem():

import scrapy_rss

item1 = scrapy_rss.RssItem()

Or use predefined item class RssedItem with RSS field named as rss that’s instance of RssItem:

import scrapy_rss

class MyItem(scrapy_rss.RssedItem):
    # scrapy.Field() and/or another fields definitions
    # ...
    field1 = scrapy.Field()
    field2 = scrapy.Field()

item2 = MyItem()

Set/get item fields. Case sensitive attributes of RssItem() are appropriate to RSS elements, Attributes of RSS elements are case sensitive too. If editor is allowed autocompletion then it suggests attributes for instances of RssItem. It’s allowed to set any subset of RSS elements (e.g. only title). For example:

from datetime import datetime

item1.title = 'RSS item title'  # set value of <title> element
title = item1.title.title  # get value of <title> element
item1.description = 'description'

item1.guid = 'item identifier'
item1.guid.isPermaLink = True  # set value of attribute isPermalink of <guid> element,
                               # isPermaLink is False by default
is_permalink = item1.guid.isPermaLink  # get value of attribute isPermalink of <guid> element
guid = item1.guid.guid  # get value of element <guid>

item1.category = 'single category'
category = item1.category
item1.category = ['first category', 'second category']
first_category = item1.category[0].category # get value of the element <category> with multiple values
all_categories = [cat.category for cat in item1.category]

# direct attributes setting
item1.enclosure.url = 'http://example.com/file'
item1.enclosure.length = 0
item1.enclosure.type = 'text/plain'

# or dict based attributes setting
item1.enclosure = {'url': 'http://example.com/file', 'length': 0, 'type': 'text/plain'}
item1.guid = {'guid': 'item identifier', 'isPermaLink': True}

item1.pubDate = datetime.now()  # correctly works with Python' datetimes


item2.rss.title = 'Item title'
item2.rss.guid = 'identifier'
item2.rss.enclosure = {'url': 'http://example.com/file', 'length': 0, 'type': 'text/plain'}

All allowed elements are listed in the scrapy_rss/items.py. All allowed attributes of each element with constraints and default values are listed in the scrapy_rss/elements.py.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

scrapy_rss-0.1.3-py2.py3-none-any.whl (13.1 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file scrapy_rss-0.1.3-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for scrapy_rss-0.1.3-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 79d8d44696d813f653b62ba8634af0036a0a723404570e9406dfe63b902c8869
MD5 b41cf6e7c0c30ad8f7eade64f67cb6ea
BLAKE2b-256 e35c62007ae82d01a47b118cc1fb663ef0003d7be694d44e907ec4219624f01e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page