Skip to main content

RSS parsing with batteries included

Project description

travis-img superss ====== RSS parsing with batteries included

feedparser is great, but sometimes it doesn’t put things in the same place. superss fixes this by finding all known candidates for urls, content, images, tags, dates, and authors and intelligently picking the best candidate. It also does some other cool things like author parsing with lauteur, url reconciliation with siegfried, and pulling links and images out of the article html.

Another problem with RSS parsing is that feeds sometimes only include a summary of the article. superss can also extract the article’s full text from the page itself with particle and merge this data with the data from the RSS feed.

Finally, some sites don’t even have RSS feeds. In this case we combine pageone and particle to create a feed of articles from article urls on a site’s homepage.

Install

pip install superss

Test

Requires nose. (only currently tests full_text rss feeds.)

nosetests

Usage

Grab full-text feeds:

from superss import SupeRSS

s = SupeRSS('http://feeds.feedburner.com/publici_rss')
for entry in s.run():
  print entry

Grab non-full-text feeds. You must install particle to run this.

from superss import SupeRSS

s = SupeRSS('http://feeds.feedburner.com/publici_rss', is_full_text = False)
for entry in s.run():
  print entry

Experimental: Build a feed from a homepage. You must install pageone and particle to run this.

from superss import SupeRSS

s = SupeRSS(homepage = 'http://nytimes.com/')
for entry in s.run():
  print entry

Optionally pass in a list of urls to ignore. This option is for the purposes of depuplicating when we’re polling a feed on an ongoing basis.

s = SupeRSS('http://feeds.feedburner.com/publici_rss', ignore_urls=[])
for entry in s.run():
  print entry

TODO

  • [ ] Add optional concurrency with gevent

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

superss-0.1.6.tar.gz (100.5 kB view details)

Uploaded Source

Built Distribution

superss-0.1.6.macosx-10.9-intel.exe (165.8 kB view details)

Uploaded Source

File details

Details for the file superss-0.1.6.tar.gz.

File metadata

  • Download URL: superss-0.1.6.tar.gz
  • Upload date:
  • Size: 100.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for superss-0.1.6.tar.gz
Algorithm Hash digest
SHA256 d81a18a20645aee3edf81c385cd9df9559442c4e265177c0a66f7a60924dd95b
MD5 9507b151faa04c9af5600f4cd80ea9f6
BLAKE2b-256 c80ab977924242ed255ff5facf3adefcd3e978f8b14feb6fa87e7f198d3aa2a5

See more details on using hashes here.

File details

Details for the file superss-0.1.6.macosx-10.9-intel.exe.

File metadata

File hashes

Hashes for superss-0.1.6.macosx-10.9-intel.exe
Algorithm Hash digest
SHA256 eb0a0e4bff6ace6325ef42c885bd4e225720c6b3239e2e8ffe0c8d0d7252d250
MD5 6cb929007c37e4b36558dd25f851e8b8
BLAKE2b-256 bc21123ab813dcd47f45a4ec366eaae46655d9f8c9c2c6dbee6cb765121921b6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page