RSS Tools for Scrapy Framework
Project description
Tools for easy RSS feed generating that contains each scraped item using Scrapy framework.
Package works with Python 2.7, 3.3, 3.4, 3.5 and 3.6.
Installation
Install
scrapy_rss
using pippip install scrapy_rss
or using pip for specific interpreter, e.g.:
pip3 install scrapy_rss
or using directly setuptools:
cd path/to/root/of/scrapy_rss python setup.py install
or using setuptools for specific interpreter, e.g.:
cd path/to/root/of/scrapy_rss python3 setup.py install
How To Use
Add parameters to the Scrapy project settings (settings.py file)
or to the custom_settings
attribute of the spider:
Add item pipeline that export items to rss feed:
ITEM_PIPELINES = { # ... 'scrapy_rss.pipelines.RssExportPipeline': 900, # or another priority # ... }
Add required feed parameters:
- FEED_FILE
absolute or relative file path where the result RSS feed will be saved. For example,
feed.rss
oroutput/feed.rss
.- FEED_TITLE
the name of the channel (feed),
- FEED_DESCRIPTION
phrase or sentence describing the channel (feed),
- FEED_LINK
the URL to the HTML website corresponding to the channel (feed)
FEED_FILE = 'path/to/feed.rss' FEED_TITLE = 'Some title of the channel' FEED_LINK = 'http://example.com/rss' FEED_DESCRIPTION = 'About channel'
Declare your item directly as RssItem():
import scrapy_rss
item1 = scrapy_rss.RssItem()
Or use predefined item class RssedItem
with RSS field named as rss
that’s instance of RssItem
:
import scrapy_rss
class MyItem(scrapy_rss.RssedItem):
# scrapy.Field() and/or another fields definitions
# ...
field1 = scrapy.Field()
field2 = scrapy.Field()
item2 = MyItem()
Set/get item fields. Case sensitive attributes of RssItem()
are appropriate to RSS elements,
Attributes of RSS elements are case sensitive too.
If editor is allowed autocompletion then it suggests attributes for instances of RssItem
.
It’s allowed to set any subset of RSS elements (e.g. only title). For example:
from datetime import datetime
item1.title = 'RSS item title' # set value of <title> element
title = item1.title.title # get value of <title> element
item1.description = 'description'
item1.guid = 'item identifier'
item1.guid.isPermaLink = True # set value of attribute isPermalink of <guid> element,
# isPermaLink is False by default
is_permalink = item1.guid.isPermaLink # get value of attribute isPermalink of <guid> element
guid = item1.guid.guid # get value of element <guid>
item1.category = 'single category'
category = item1.category
item1.category = ['first category', 'second category']
first_category = item1.category[0].category # get value of the element <category> with multiple values
all_categories = [cat.category for cat in item1.category]
# direct attributes setting
item1.enclosure.url = 'http://example.com/file'
item1.enclosure.length = 0
item1.enclosure.type = 'text/plain'
# or dict based attributes setting
item1.enclosure = {'url': 'http://example.com/file', 'length': 0, 'type': 'text/plain'}
item1.guid = {'guid': 'item identifier', 'isPermaLink': True}
item1.pubDate = datetime.now() # correctly works with Python' datetimes
item2.rss.title = 'Item title'
item2.rss.guid = 'identifier'
item2.rss.enclosure = {'url': 'http://example.com/file', 'length': 0, 'type': 'text/plain'}
All allowed elements are listed in the scrapy_rss/items.py. All allowed attributes of each element with constraints and default values are listed in the scrapy_rss/elements.py.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file scrapy_rss-0.1.3-py2.py3-none-any.whl
.
File metadata
- Download URL: scrapy_rss-0.1.3-py2.py3-none-any.whl
- Upload date:
- Size: 13.1 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 79d8d44696d813f653b62ba8634af0036a0a723404570e9406dfe63b902c8869 |
|
MD5 | b41cf6e7c0c30ad8f7eade64f67cb6ea |
|
BLAKE2b-256 | e35c62007ae82d01a47b118cc1fb663ef0003d7be694d44e907ec4219624f01e |