This is a pre-production deployment of Warehouse, however changes made here WILL affect the production instance of PyPI.
Latest Version Dependencies status unknown Test status unknown Test coverage unknown
Project Description

What it does

Newspeak is a feed aggregator with advanced features for keyword filtering and link content extraction, implemented as a standaloone Django application.

Architecture

Newspeak performs the following tasks (in order):

  1. Fetch specified RSS/Atom feeds as per the Feed model (in parallel).
  2. Parses the feeds using feedparser.
  3. (Optionally) applies per-feed inclusive/exclusive keyword filters on the title and/or summary, based on the KeywordFilter model.
  4. (Optionally) extract summary data using an XPath expression from feed entry’s link URL, using lxml.
  5. (Optionally) extract enclosure information using XPath expressions from the feed entry’s link URL, using lxml.
  6. Store the resulting feed information locally in a database.
  7. Serve the aggregate of all the feed entries in a single RSS/Atom feed.

The flow of feed data through the application is roughly as follows (given some example feeds and keyword filters):

[Feed 1]-[Keyword filter 1]-[Keyword filter 2]-[XPath content extraction]-----------------------------`\
[Feed 2]--------------------[Keyword filter 3]-[XPath summary extraction]-[XPath content extraction ] -+--[Aggregate output feed]
[Feed 3]-[Keyword filter 3]-[Keyword filter 4]---------------------------------------------------------/

Installing

Getting started with newspeak is really easy thanks to David Cramer’s awesome logan for making standalone Django apps. Simply perform the following steps:

  1. Install such that you can easily code along:

    pip install -e \
      git+https://github.com/bitsoffreedom/newspeak.git#egg=newspeak
    

    If you’re smart and like to keep your Python environment clean, do this in a VirtualEnv.

  2. Initialize configuration in ~/.newspeak/newspeak.conf.py:

    newspeak init
    
  3. Perform (optional) configuration by editing the settings file. Because Newspeak is based on Django, all available Django settings can be used. Furthermore, there are some Newspeak-specific settings:

    • NEWSPEAK_THREADS: The number of (lightweight) threads used for crawling feed data.
    • NEWSPEAK_METADATA: Metadata used in the generated output feed.

    For a more thorough description and an example of these settings, please have a look at the initial settings file generated in the previous step.

  4. (Optionally) Run the tests:

    newspeak test newspeak
    

    This might take a while, so go fetch a cup of coffee. If something fails, please supply the output of the command newspeak test newspeak –traceback in an issue on GitHub.

  5. Create admin user and SQLite database (proper database is optional):

    newspeak syncdb --migrate
    
  6. Start the local webserver:

    newspeak run_gunicorn
    
  7. Open http://127.0.0.1:8000/admin/ in your browser, add some feed. Only the URL is required, the description and title will be fetched automatically, as well as the first set of entries.

  8. (Optionally) Configure one or more keyword-based filters for your feed(s).

  9. Make sure the following command gets executed to update the feeds:

    newspeak update_feeds
    

    (Optionally, add -v <1|2|3> to get more feedback on the process.)

  10. Look at the pretty feeds: open http://127.0.0.1:8000/all/rss/ or http://127.0.0.1:8000/all/atom/ in your favorite feed reader. All input feeds will be aggregated there.

    Alternatively, the original feeds, keywords and XPath expressions as used by Bits of Freedom are contained in a fixture called feeds_bof.json. This fixture can be loaded using:

    newspeak loaddata feeds_bof
    
  11. Setup a Cronjob to automatically update the feed data using the newspeak update_feeds command. For example, a cron job updating the feeds every hour could look as follows:

    0 * * * *  <full_path_to_>/newspeak update_feeds
    

Upgrading

  1. Run the PIP installation command again:

    pip install -e \
      git+https://github.com/bitsoffreedom/newspeak.git#egg=newspeak
    
  2. (Optionally) Run the tests:

    newspeak test newspeak
    
  3. Apply any database migrations:

    newspeak migrate
    
Release History

Release History

0.1

This version

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

Download Files

Download Files

TODO: Brief introduction on what you do with files - including link to relevant help section.

File Name & Checksum SHA256 Checksum Help Version File Type Upload Date
newspeak-0.1.tar.gz (29.9 kB) Copy SHA256 Checksum SHA256 Source Apr 2, 2013

Supported By

WebFaction WebFaction Technical Writing Elastic Elastic Search Pingdom Pingdom Monitoring Dyn Dyn DNS HPE HPE Development Sentry Sentry Error Logging CloudAMQP CloudAMQP RabbitMQ Heroku Heroku PaaS Kabu Creative Kabu Creative UX & Design Fastly Fastly CDN DigiCert DigiCert EV Certificate Rackspace Rackspace Cloud Servers DreamHost DreamHost Log Hosting