Skip to main content

Standalone Django based feed aggregator.

Project description

https://secure.travis-ci.org/bitsoffreedom/newspeak.png?branch=master

What it does

Newspeak is a feed aggregator with advanced features for keyword filtering and link content extraction, implemented as a standaloone Django application.

Architecture

Newspeak performs the following tasks (in order):

  1. Fetch specified RSS/Atom feeds as per the Feed model (in parallel).

  2. Parses the feeds using feedparser.

  3. (Optionally) applies per-feed inclusive/exclusive keyword filters on the title and/or summary, based on the KeywordFilter model.

  4. (Optionally) extract summary data using an XPath expression from feed entry’s link URL, using lxml.

  5. (Optionally) extract enclosure information using XPath expressions from the feed entry’s link URL, using lxml.

  6. Store the resulting feed information locally in a database.

  7. Serve the aggregate of all the feed entries in a single RSS/Atom feed.

The flow of feed data through the application is roughly as follows (given some example feeds and keyword filters):

[Feed 1]-[Keyword filter 1]-[Keyword filter 2]-[XPath content extraction]-----------------------------`\
[Feed 2]--------------------[Keyword filter 3]-[XPath summary extraction]-[XPath content extraction ] -+--[Aggregate output feed]
[Feed 3]-[Keyword filter 3]-[Keyword filter 4]---------------------------------------------------------/

Installing

Getting started with newspeak is really easy thanks to David Cramer’s awesome logan for making standalone Django apps. Simply perform the following steps:

  1. Install such that you can easily code along:

    pip install -e \
      git+https://github.com/bitsoffreedom/newspeak.git#egg=newspeak

    If you’re smart and like to keep your Python environment clean, do this in a VirtualEnv.

  2. Initialize configuration in ~/.newspeak/newspeak.conf.py:

    newspeak init
  3. Perform (optional) configuration by editing the settings file. Because Newspeak is based on Django, all available Django settings can be used. Furthermore, there are some Newspeak-specific settings:

    • NEWSPEAK_THREADS: The number of (lightweight) threads used for crawling feed data.

    • NEWSPEAK_METADATA: Metadata used in the generated output feed.

    For a more thorough description and an example of these settings, please have a look at the initial settings file generated in the previous step.

  4. (Optionally) Run the tests:

    newspeak test newspeak

    This might take a while, so go fetch a cup of coffee. If something fails, please supply the output of the command newspeak test newspeak –traceback in an issue on GitHub.

  5. Create admin user and SQLite database (proper database is optional):

    newspeak syncdb --migrate
  6. Start the local webserver:

    newspeak run_gunicorn
  7. Open http://127.0.0.1:8000/admin/ in your browser, add some feed. Only the URL is required, the description and title will be fetched automatically, as well as the first set of entries.

  8. (Optionally) Configure one or more keyword-based filters for your feed(s).

  9. Make sure the following command gets executed to update the feeds:

    newspeak update_feeds

    (Optionally, add -v <1|2|3> to get more feedback on the process.)

  10. Look at the pretty feeds: open http://127.0.0.1:8000/all/rss/ or http://127.0.0.1:8000/all/atom/ in your favorite feed reader. All input feeds will be aggregated there.

    Alternatively, the original feeds, keywords and XPath expressions as used by Bits of Freedom are contained in a fixture called feeds_bof.json. This fixture can be loaded using:

    newspeak loaddata feeds_bof
  11. Setup a Cronjob to automatically update the feed data using the newspeak update_feeds command. For example, a cron job updating the feeds every hour could look as follows:

    0 * * * *  <full_path_to_>/newspeak update_feeds

Upgrading

  1. Run the PIP installation command again:

    pip install -e \
      git+https://github.com/bitsoffreedom/newspeak.git#egg=newspeak
  2. (Optionally) Run the tests:

    newspeak test newspeak
  3. Apply any database migrations:

    newspeak migrate

Project details


Release history Release notifications | RSS feed

This version

0.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

newspeak-0.1.tar.gz (29.9 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page