Skip to main content

Typed pythonic RSS/Atom parser

Project description

Rss parser

Downloads Downloads Downloads

PyPI version Python versions Wheel status License

Docs CI PyPi publish

About

rss-parser is typed python RSS/Atom parsing module built using pydantic and xmltodict

Installation

pip install rss-parser

or

git clone https://github.com/dhvcc/rss-parser.git
cd rss-parser
poetry build
pip install dist/*.whl

V1 -> V2 migration

  • Parser class was renamed to RSSParser
  • Models for RSS-specific schemas were moved from rss_parser.models to rss_parser.models.rss. Generic types are not touched
  • Date parsing was changed a bit, now uses pydantic's validator instead of email.utils, so the code will produce datetimes better, where it was defaulting to str before

Usage

Quickstart

NOTE: For parsing Atom, use AtomParser

from rss_parser import RSSParser
from requests import get  # noqa

rss_url = "https://rss.art19.com/apology-line"
response = get(rss_url)

rss = RSSParser.parse(response.text)

# Print out rss meta data
print("Language", rss.channel.language)
print("RSS", rss.version)

# Iteratively print feed items
for item in rss.channel.items:
    print(item.title)
    print(item.description[:50])

# Language en
# RSS 2.0
# Wondery Presents - Flipping The Bird: Elon vs Twitter
# <p>When Elon Musk posted a video of himself arrivi
# Introducing: The Apology Line
# <p>If you could call a number and say you’re sorry

Here we can see that description is still somehow has

- this is beacause it's placed as CDATA like so

<![CDATA[<p>If you could call ...</p>]]>

Overriding schema

If you want to customize the schema or provide a custom one - use schema keyword argument of the parser

from rss_parser import RSSParser
from rss_parser.models import XMLBaseModel
from rss_parser.models.rss import RSS
from rss_parser.models.types import Tag


class CustomSchema(RSS, XMLBaseModel):
    channel: None = None  # Removing previous channel field
    custom: Tag[str]


with open("tests/samples/custom.xml") as f:
    data = f.read()

rss = RSSParser.parse(data, schema=CustomSchema)

print("RSS", rss.version)
print("Custom", rss.custom)

# RSS 2.0
# Custom Custom tag data

xmltodict

This library uses xmltodict to parse XML data. You can see the detailed documentation here

The basic thing you should know is that your data is processed into dictionaries

For example, this data

<tag>content</tag>

will result in the following

{
    "tag": "content"
}

But, when handling attributes, the content of the tag will be also a dictionary

<tag attr="1" data-value="data">data</tag>

Turns into

{
    "tag": {
        "@attr": "1",
        "@data-value": "data",
        "#text": "content"
    }
}

Multiple children of a tag will be put into a list

<div>
    <tag>content</tag>
    <tag>content2</tag>
</div>

Results in a list

[
    { "tag": "content" },
    { "tag": "content" },
]

If you don't want to deal with those conditions and parse something always as a list - please, use rss_parser.models.types.only_list.OnlyList like we did in Channel

from typing import Optional

from rss_parser.models.rss.item import Item
from rss_parser.models.types.only_list import OnlyList
from rss_parser.models.types.tag import Tag
from rss_parser.pydantic_proxy import import_v1_pydantic

pydantic = import_v1_pydantic()
...


class OptionalChannelElementsMixin(...):
    ...
    items: Optional[OnlyList[Tag[Item]]] = pydantic.Field(alias="item", default=[])

Tag field

This is a generic field that handles tags as raw data or a dictonary returned with attributes

Example

from rss_parser.models import XMLBaseModel
from rss_parser.models.types.tag import Tag


class Model(XMLBaseModel):
    width: Tag[int]
    category: Tag[str]


m = Model(
    width=48,
    category={"@someAttribute": "https://example.com", "#text": "valid string"},
)

# Content value is an integer, as per the generic type
assert m.width.content == 48

assert type(m.width), type(m.width.content) == (Tag[int], int)

# The attributes are empty by default
assert m.width.attributes == {} # But are populated when provided.

# Note that the @ symbol is trimmed from the beggining and name is convert to snake_case
assert m.category.attributes == {'some_attribute': 'https://example.com'}

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Install dependencies with poetry install (pip install poetry)

pre-commit usage is highly recommended. To install hooks run

poetry run pre-commit install -t=pre-commit -t=pre-push

License

GPLv3

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rss_parser-2.1.0.tar.gz (25.5 kB view details)

Uploaded Source

Built Distribution

rss_parser-2.1.0-py3-none-any.whl (29.8 kB view details)

Uploaded Python 3

File details

Details for the file rss_parser-2.1.0.tar.gz.

File metadata

  • Download URL: rss_parser-2.1.0.tar.gz
  • Upload date:
  • Size: 25.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.10.0 Linux/6.8.0-1014-azure

File hashes

Hashes for rss_parser-2.1.0.tar.gz
Algorithm Hash digest
SHA256 4a1eb0f69442b9b8f3b8343c053c3a772c8e9a5c8a6a969edadc03800f30b47e
MD5 132a9fc810304d647ecd970c91ae97be
BLAKE2b-256 72f18853d9808f68b4a34a316977f0082906b32e8a2313b6fb3935155fb055a1

See more details on using hashes here.

File details

Details for the file rss_parser-2.1.0-py3-none-any.whl.

File metadata

  • Download URL: rss_parser-2.1.0-py3-none-any.whl
  • Upload date:
  • Size: 29.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.10.0 Linux/6.8.0-1014-azure

File hashes

Hashes for rss_parser-2.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 193b76f3292657faf85dd11dfe823b9007551fb7722d4363316870e32aff5ced
MD5 ec36ce4be5bbdbe6a1213e857ea7b7e4
BLAKE2b-256 d16643fb6a0a1b3be3974e03b3d1182c066cddb6efedd7b3b23609597f962631

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page