Skip to main content

Search a site for RSS feeds

Project description

Feedsearch is a Python library for searching websites for RSS feeds.

It was originally based on Feedfinder2 written by Dan Foreman-Mackey, which in turn is based on feedfinder - originally written by Mark Pilgrim and subsequently maintained by Aaron Swartz until his untimely death.

The main difference with Feedfinder2 is that Feedsearch allows for optional fetching of Feed and Site metadata.

Usage

Feedsearch is called with the single function search:

>>> from feedsearch import search
>>> feeds = search('xkcd.com')
>>> feeds
[FeedInfo: <http://xkcd.com/atom.xml>, FeedInfo: <http://xkcd.com/rss.xml>]
>>> feeds[0].url
'http://xkcd.com/atom.xml'

To get Feed and Site metadata:

>>> feeds = search('propublica.org', info=True)
>>> feeds
[FeedInfo: http://feeds.propublica.org/propublica/main]
>>> pprint(vars(feeds[0]))
{'description': 'Latest Articles and Investigations from ProPublica, an '
                'independent, non-profit newsroom that produces investigative '
                'journalism in the public interest.',
'hub': 'http://feedpress.superfeedr.com/',
'is_push': True,
'score': 4,
'site_icon_url': 'https://assets.propublica.org/prod/v3/images/favicon.ico',
'site_name': 'ProPublica',
'site_url': 'https://www.propublica.org/',
'title': 'Articles and Investigations - ProPublica',
'url': 'http://feeds.propublica.org/propublica/main'}

Search will always return a list of FeedInfo objects, each of which will always have a url property. Feeds are sorted by the score value from highest to lowest, with a higher score theoretically indicating a more relevant feed, but whatever you do don’t take this seriously.

If you only want the raw urls, then simply use a list comprehension on the result:

>>> feeds
[FeedInfo: http://xkcd.com/atom.xml, FeedInfo: http://xkcd.com/rss.xml]
>>> urls = [f.url for f in feeds]
>>> urls
['http://xkcd.com/atom.xml', 'http://xkcd.com/rss.xml']

In addition to the URL, the search function takes the following optional keyword arguments:

  • info: bool: Get Feed and Site Metadata. Defaults False.

  • check_all: bool: Check all <link> and <a> tags on page. Defaults False.

  • user_agent: str: User-Agent Header string. Defaults to Package name.

  • timeout: int or tuple: Timeout for each request in the search (not a timeout for the search method itself). Defaults to 30 seconds.

  • max_redirects: int: Maximum number of redirects for each request. Defaults to 30.

  • parser: str: BeautifulSoup parser for HTML parsing. Defaults to ‘html.parser’.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

feedsearch-0.0.1.tar.gz (8.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

feedsearch-0.0.1-py3-none-any.whl (11.8 kB view details)

Uploaded Python 3

File details

Details for the file feedsearch-0.0.1.tar.gz.

File metadata

  • Download URL: feedsearch-0.0.1.tar.gz
  • Upload date:
  • Size: 8.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for feedsearch-0.0.1.tar.gz
Algorithm Hash digest
SHA256 76636abd046d1f8036068e2ae2dff607f9331f202ec74b29a75858e0058c273e
MD5 5153def4b59d5129fe9000530641b84b
BLAKE2b-256 9794229fc2867136b46678188c97c09b3e82d8f6cf7a36273f4362887ce18e04

See more details on using hashes here.

File details

Details for the file feedsearch-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for feedsearch-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 6ec9fb7860c5df8c81bd33857069ea26efbeef5b5cde05e8ed6a80a5e2566c82
MD5 ca9a939336140565b85a53f029d79e2a
BLAKE2b-256 15a0aae7a92dc2889758097aadd078fb7f4bcfe3e75d3916fb17f916bb2e7b83

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page