Turn external feed entries into content items
Feedfeeder has just a few things it needs to do:
- Read in a few ATOM feeds (not too many).
- Create FeedFeederItems out of the entries pulled from the ATOM feeds. Any feed items that contain enclosures will have the enclosures pulled down and added as File items to the feed item.
- This means figuring out which items are new, which also means having a good ID generating mechanism.
Wait, no existing product?
There’s a whole slew of RSS/ATOM reading products for zope and plone. None of them seemed to be a good fit. There was only one product that actually stored the entries in the zope database, but that was aimed at a lot of users individually adding a lot of feeds, so it needed either a separate ZEO process (old version) or a standalone mysql database (new version).
All the other products didn’t store the entries in the database, were old/unmaintained/etc.
In a sense, we’re using an existing product as we use Mark Pilgrim’s excellent feedparser (http://feedparser.org) that’ll do the actual ATOM reading for us.
The product feeds the content of ATOM feeds to plone as document/file content types. So “feedfeeder” sort of suggested itself as a funny name. Fun is important :-)
I’m using archgenxml to generate the boiler plate stuff. There’s a ‘generate.sh’ shell script that’ll call archgenxml for you. Nothing fancy.
- The feedfeeder’s content types are:
How it works
A feedfeeder is a folder which contains all the previously-added feed entries as documents or files. It has a ‘feeds’ attribute that contains a list of feeds to read.
Feedparser is called periodically (through a cron job?) to parse the feeds. The UID of the items in the feed are converted to a suitable filename (md5 hex hash of the atom id of the entry), that way you can detect whether there are new items.
New items are turned into feed items. Feed data are filled into feed items (see field named objectInfo).
Scheduled updates for feed folders
Zope can be configured to periodically trigger a url call. In zope.conf you can use the <clock-server> directive to define a schedule and url with the following data:
<clock-server> method /path_to_feedfolder/update_feed_items period 3600 # seconds user admin password 123 host localhost:8080 </clock-server>
Updating all feeds once
If your site has several feed folders and you want update them all once you can do:
<clock-server> method /yoursiteid/feed-mega-update period 3600 # seconds user admin password 123 host localhost:8080 </clock-server>
Removing old feed items
You can periodically remove feed items older than a specific number of days. For example, to remove once a week feed items older than 90 days you can do:
<clock-server> method /yoursiteid/feed-mega-cleanup?days=90 period 604800 # seconds user admin password 123 host localhost:8080 </clock-server>
Since version 3 we need Plone 4.3 or 5.0.
Plone 5: in the add-ons control panel you also need to install ‘Archetypes Content Types for Plone’. Otherwise, if you try to add a FeedfeederFolder, you will get a 404 Not Found error because the createObject script is not found.
For earlier Plone 4 versions, use version 2.x. The current latest is 2.8.
If you use Plone 3, please use a Products.feedfeeder version from the 2.0 line. The current latest is 2.0.9.
If you have installed Products.feedfeeder 2.1.x in Plone 4.0 or 4.1 and you upgrade to Plone 4.2 or higher, then you will be missing some functionality for listing or ordering feedfeeder items in new style collections. To solve this, you should go to portal_setup in the Zope Management Interface, visit the Import tab, select the “Feedfeeder registry” profile and import all steps.
History of feedfeeder
- Bug fix: really call getObjectInfo() when checking if an entry was updated. This avoids unnecessary updates to FeedFeederItems. [tiberiuichim]
- Compatible with Plone 4.3 and 5.0. [maurits]
- Removed separate registry profile that was only needed for compatibility with Plone 4.1 and lower. Moved registry.xml to the default profile. [maurits]
- Disabled CSRF protection on our update/clean feed views. Otherwise you would have to add ?_authenticator=user_specific_authentication_string to the urls in your cronjobs. Fixes issue https://github.com/collective/Products.feedfeeder/issues/13 [maurits]
- Use main_template/macros/master, instead of strange old @@standard-macros/view which would show only the core content on Plone 5. [maurits]
- Prevent UnicodeEncodeError in logging messages . [ulisdd]
- Updated Spanish translations. [Manuel Gualda Caballero]
- Add option to prefix feed link titles using a pipe | as separator (My place: |http://myplace/feed) [jbofill]
- Reindex feed item when setting the description. [jbofill]
KeyError u'+0000'in some DateTime objects. Related to https://github.com/collective/Products.feedfeeder/issues/7 [jbofill]
- Update to beautifulsoup4 and use python’s built-in HTML parser. [jbofill]
- Depend on feedparser instead of FeedParser. Issue #6. [maurits]
- Add maximum size to 10 MB for enclosures. This avoids downloading gigabytes of iso files, for example. [jbofill]
- Take the title as basis for the uid of an item if both guid and link are not found. They are optional in rss. [maurits]
- Update permissions. Protect updating a feed with the “feedfeeder: Update feed” permission. Protect updating all feeds in a mega update with the “feedfeeder: Update all feeds” permission. We give these to the Manager and Site Administrator roles in an upgrade step. Fixes https://github.com/collective/Products.feedfeeder/issues/4 [maurits]
- Use locales instead of an i18n directory. [maurits]
- Support our criterion in new style collections. Add new profile for this. Make sure not to fail on Plone 4.0 or 4.1 where this is not needed at all. [maurits]
- Update feed folder after its creation i18n for untranslated strings Added div#content in feed folder template Fixed tests Lots of cleanup (old content type definitions in content/folder.py and content/item.py) Removed double for “update feed items” action French translations [cedricmessiant]
- Source is open in a new page. [thomasdesvenain]
- Use png icons. Use icon_expr instead of content_icon. [thomasdesvenain]
- Support only Plone 4. [maurits]
- Fixed possible TypeError when updating feed items. Fixes https://plone.org/products/feedfeeder/issues/42 [maurits]
- Moved to https://github.com/collective/Products.feedfeeder [maurits]
- Avoid BadRequest error when an entry has two enclosures with the same href; we ignore all subsequent ones. Fixes http://plone.org/products/feedfeeder/issues/41 [maurits]
- Try to avoid possible ExpatError for some feeds. Fixes http://plone.org/products/feedfeeder/issues/40 [maurits]
- Cleaned up our type info, removing some cruft from Plone 2.5. Added upgrade step for this. [maurits]
- protect against UnicodeDecode errors in getting the UID of an entry. [vangheem]
- Guard against links (enclosures) not having a type. Fixes http://plone.org/products/feedfeeder/issues/39 [maurits]
- Use feed-item.pt on Plone 4, filling the content-core slot, and feed-item3.pt on Plone 3, filling the body slot as before. Fixes http://plone.org/products/feedfeeder/issues/36 [maurits]
- Register our own documentbyline viewlet for feed items, which displays the feed item author as creator. Refs http://plone.org/products/feedfeeder/issues/36 [Maurits]
- Fixed possible UnicodeDecodeError when updating feed items. Refs http://plone.org/products/feedfeeder/issues/37 [maurits]
- Fixed Plone 4.1 compatibility [iElectric]
- Avoid DeprecationWarning on python2.6 by preferring hashlib over md5 when available. [maurits]
- Do not reindex the feed item when nothing has changed. Only update the objectInfo field when there has been a change. Fixes http://plone.org/products/feedfeeder/issues/34 [maurits]
- Respect the Plone setting on the ‘about’ information: only show the document byline if the user is logged in or anonymous users are allowed to view the about information. [markvl]
- Modified import RSS and added a new field on feed items named objectInfo. All feed data will be stored on this field, as a python dict. Just changing the remote RSS template, you will able to memoize additional info without having to modify the feed item schema. [dmoro]
- Added an option on feed folder that let you choose to redirect automatically to remote resources. If you have modify permissions on feed items there will not be any redirect [dmoro]
- Added new tests [sithmel]
- Added @@feed-mega-update view so you can update all feed folders at once, for example in a clock server. [miohtoma]
- Import HTMLParseError from the standard python HTMLParser instead of BeautifulSoup. This makes feedfeeder compatible with BeautifulSoup 3.0.x again. [maurits]
- Solve some Plone 4 compatibility issues. [sureshvv]
- Ignore unidentifiable entries without id or link, instead of throwing an AttributeError. Fixes http://plone.org/products/feedfeeder/issues/26 [maurits]
- Fix errors when viewing a folder or item on Plone 4, while still keeping Plone 2.5 and Plone 3 compatibility. Refs http://plone.org/products/feedfeeder/issues/25 [maurits]
- Some summaries are a snippet from the full content, and then they can contain broken html; in this case we are now saving the raw broken html, parsing it only when possible. [lucmult]
- Improved the translations stuffs [lucmult]
- Changed the way to translate xml/html entities from summary, now using BeautifulSoup. Old way was breaking with some non ascii characters. [lucmult]
- When setting the text of a feed item during updating, store the mimetype as well if it is a supported one. Refs http://plone.org/products/feedfeeder/issues/24 [maurits]
- Bug fix: curly quotes getting mangled when Descriptions are built. Fixes http://plone.org/products/feedfeeder/issues/7 (Merged branch maurits-cleaner-entityrefs-in-description.) [maurits]
- Do not add our skin layer to Plone Default and certainly not to Plone Tableless, but just to all (*). [maurits]
- When both the updated and published date of an item is not known, take today as the date when first adding it. When updating, do not change the original item. Fixes http://plone.org/products/feedfeeder/issues/21 [maurits]
- Read tags/categories/keywords of feed items and store them on the created content item. No Archetypes field, just a simple getter and setter called feed_tags. Idea: Robin Harms Oredsson. [maurits]
- DateTime.SyntaxError is thrown with some very common US Daylight Saving zones, such as EDT. We now wrap the DateTime parsing of feeds, to try to recognise those zones before politely giving up, using maurits’ fix, below. [russf]
- Catch DateTime.SyntaxError when parsing the updated and published dates of an entry and continue with the next entry. Fixes http://plone.org/products/feedfeeder/issues/18 [maurits]
- Avoid swallowing too much exceptions when applying our GenericSetup profile. Fixes http://plone.org/products/feedfeeder/issues/19 [maurits]
- Moved profile definition from python to GenericSetup. Profile is now not ‘profile-feedfeeder:default’ but ‘profile-Products.feedfeeder:default’. [maurits]
- In the Extensions/ dir: removed Install.py and renamed AppInstall.py to install.py. [maurits]
- Made feed item updated date available for Collections/Smart Folders. [maurits]
- Extensions/AppInstall.py: first try installing our own profile in the Plone 3 way and when that fails try the Plone 2.5 way. [maurits]
- Removed own feedparser.py. Instead added an install_requires dependency on FeedParser in setup.py. [maurits]
- Moved fix for feeds starting with ‘feed:’ instead of ‘http:’ from feedparser.py to utilities.py, so we use an unchanged feedparser.py again. [maurits]
1.0 rc 2 (2008-07-23)
- Re-release of rc1: rc1 was missing all .txt files, making install impossible as setup.py reads version.txt. [reinout]
1.0 rc 1 (2008-07-15)
- Accept entries without a title, which is allowed in rss. See http://cyber.law.harvard.edu/rss/rss.html#hrelementsOfLtitemgt [maurits]
1.0 beta 4 (2008-05-20)
- Eggification: you can now install it as the Products.feedfeeder egg. [maurits]
1.0 beta 3 (2008-05-13)
- In the tests, use plone_workflow explicitly, so it is easier to test on both Plone 2.5 and 3.0. [maurits]
- Make update_feed_items available in the object_buttons for Plone 3, using new small @@is_feedcontainer as condition. [maurits]
- Avoid deprecation warnings for events and interfaces. [maurits]
- Remove semicolon in page template that broke in Plone 3. [maurits]
- Fix imports so they work in Plone 3 as well, without deprecation warnings. [derstappenit]
1.0 beta 2 (2008-01-02)
- History begins.