Skip to main content

Async Simple and fast RSS parser module.

Project description

Ultra FAST RSS

PyPI - Version PyPI - Python Version


A simple ASYNC RSS parser. To make retrieving RSS feeds ultra fast.

Table of Contents

Status

[!CAUTION] This is module works and is used! But should be used with care! Extensive exception validation is left out on purpose for now in the code.

Installation

pip install ultrafastrss

Usage

In your Python program call the async function with a (large) list of URLs: process_rssfeeds(urls)

To validate and see the working: Create a simple demo program, create a python file fastrss_demo.py like (just copy-paste):

from ultrafastrss.ultrafastrss import process_rssfeeds 
import asyncio

urls = [
        'https://nocomplexity.com/rss',
        'https://www.eff.org/rss/updates.xml',
        'https://www.freebsd.org/news/rss.xml',
        'https://ubuntu.com/blog/feed',
        'https://nlnet.nl/feed.atom',
        'https://j3s.sh/feed.atom',
        'https://blog.research.google/atom.xml'
    ]
    
# Run the async RSS feed processing, all results are as json stored in the variable results

results = asyncio.run(process_rssfeeds(urls))
print(results)

After running this demo in a terminal by doing:

python ultrafastrss_demo.py

You should seen directly the result of all RSS information gathered!

You can do all kind of fun things with the returned json object which contains the crucial parts from the parsed rss or atom feeds!

Benchmarking

I did some benchmarking test with feedparser, a module that is widely used within many Python programs for parsing rss feeds. Results are minimal 10 times faster with 500 URLs.

License

ultrafastrss is distributed under the terms of the GPL-3.0-or-later license.

Architecture

This ultrafastrss parser is designed to parse RSS or Atom return JSON data. The parsing is delegating to specific parsers functions (rss_parser, atom_parser or rdf_parser). Only the attributes title, link and date are retrieved.

The key design principle for this parser is to retrieve JSON data from a significant number of feeds asynchronously.

Feed content or summary is not parsed on purpose. This is also not a RSS reader. But by clicking the retrieved link, the retrieved item can be viewed in a browser.

The fields that are parsed and stored for every feed are:

"title": "Exphormer: Scaling transformers for graph-structured data",
      "link": "http://blog.research.google/2024/01/exphormer-scaling-transformers-for.html",
      "date": "2024-01-23",
      "tags": [
        "Deep Learning",
        "Graphs"

All kind but edge cases are left out on purpose to keep the needed code simple. Creating a generic module for RSS/ATOM parsing is not simple. So design decision is to only incorporate logic that I need for the URLs I use.

Contribute

Great you want to contribute!

Simple Guidelines:

  • Questions, Feature Requests, Bug Reports please use on the Github Issue Tracker.

[!NOTE] This is module works and is used! But it is in pre-release status. Extensive exception validation is left out on purpose for now in the code. Feature requests are possible, but will probably only be realized after a quote and paid invoice. But this code is so Simple that I encourage everyone to make a special parser and share the code. So we all benefit!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ultrafastrss-0.1.19.tar.gz (54.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ultrafastrss-0.1.19-py3-none-any.whl (19.4 kB view details)

Uploaded Python 3

File details

Details for the file ultrafastrss-0.1.19.tar.gz.

File metadata

  • Download URL: ultrafastrss-0.1.19.tar.gz
  • Upload date:
  • Size: 54.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-httpx/0.28.1

File hashes

Hashes for ultrafastrss-0.1.19.tar.gz
Algorithm Hash digest
SHA256 4d4d083aecc8c98b8798225fc50fd2aa9a241a9b350f5beaf2c0d651ae341064
MD5 4af6b3fb98529796ed66f0b2c158cc84
BLAKE2b-256 f8b00a626e8b1078dbeb29d1058ad240fa5efe103d1fecef00e4d9a7d7c8503b

See more details on using hashes here.

File details

Details for the file ultrafastrss-0.1.19-py3-none-any.whl.

File metadata

File hashes

Hashes for ultrafastrss-0.1.19-py3-none-any.whl
Algorithm Hash digest
SHA256 80760f33f60f4dc10b5b1c6e71eac24786a67a90e57e64ca1e638d7c047c8de0
MD5 584365022302648d8db84aaac08dd868
BLAKE2b-256 9da9157e33f7ab48113ebc5d3867cdfc61ceb13bdbb23b29aa0d6e99589dbc6f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page