This is a pre-production deployment of Warehouse. Changes made here affect the production instance of PyPI (pypi.python.org).
Help us improve Python packaging - Donate today!
Project Description

What is it?

Feedfiller is intended to work alongside Zest Software’s FeedFeeder

package, and provide the additional functionality of filling each news feed item with the clean body content of the page it refers to. Feedfiller can be educated about structure of content, to help access the most interesting page elements. If does not yet ‘know’ the structure of the target page, all it can do is include the whole page. We will improve this as the project develops.

Clearly there are potential copyright issues with the re-publishing of copyrighted works. But for research and analysis purposes, these may not be an issue for your organization. Our own purpose is to use collected text for classification and analysis for internal use. You should seek your own legal advice on this topic.

Dependencies

BeautifulSoup, Products.feedfeeder. If you use the egg package, these dependencies will be managed for you.

How does it work?

Feedfiller subscribes to the event created after storage of each news feed item created by FeedFeeder and fetches the target page of that item. This means that all items will be be filled with the content of the page they refer to. Fetched pages are flayed (“Flay: Verb: to strip off the skin or surface of”) by a Flayer looked up in a FlayerRegistry by URL.

Flayers may be easily written to accomodate new pages. Flayers can be created and registered for different sections of a site, in case HTML structure varies in sub-trees of the site.

If no flayer is registered for the URL, a default flayer is used that returns the whole body of the page.

Currently site-specific flayers try to reveal author, copyright, and body, but the default flayer

The flayer base-class currently stores the original page fetched from the server, to facilitate further development and refinemement of flayers without repeatedly fetching content.

TODO

The next step is to develop a table-driven flayer, for which table entries can be generated interactively by clicking on an enhanced version of the default flay, a bit like a basic firebug view of the structure of a page with buttons to manually select the body area of a page. This will rmoyrequire a new view for this purpose, available to managers.

There is no reason why the table-drive flayer should not be able to handle the complexity of the BBC news page, leaving only the trickiest pages to the custom class approach used currently.

Table items should eventually be replicated across all other feedfiller users, perhaps using bi-directional rsync using a central repository, or perhaps using svn.

CREDITS

The project was initiated by Russ Ferriday, Topia Systems Ltd, in November, 2008.

Thanks to ‘Business Across Borders’__ for sponsorship of this work.

__BusinessAcrossBorders: http://businessacrossborders.com/

Thanks to Zest Software and the van Rees brothers for FeedFeeder.

Contributions are welcome, and contributors are listed below:

Changelog

0.1 - Unreleased

  • Initial release
Release History

Release History

0.1dev-r77946

This version

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.1dev-r77944

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.1dev-r77077

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.1dev-r77076

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.1dev-r77074

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.1dev-r77073

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.1dev-r77064

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

Download Files

Download Files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

File Name & Checksum SHA256 Checksum Help Version File Type Upload Date
collective.feedfiller-0.1dev-r77946.tar.gz (51.7 kB) Copy SHA256 Checksum SHA256 Source Dec 23, 2008

Supported By

WebFaction WebFaction Technical Writing Elastic Elastic Search Pingdom Pingdom Monitoring Dyn Dyn DNS Sentry Sentry Error Logging CloudAMQP CloudAMQP RabbitMQ Heroku Heroku PaaS Kabu Creative Kabu Creative UX & Design Fastly Fastly CDN DigiCert DigiCert EV Certificate Rackspace Rackspace Cloud Servers DreamHost DreamHost Log Hosting