Skip to main content

High-performance RSS/Atom/JSON Feed parser (drop-in feedparser replacement)

Project description

feedparser-rs-py

High-performance RSS/Atom/JSON Feed parser for Python — drop-in replacement for feedparser.

Features

  • 🚀 10-100x faster than feedparser (Rust core)
  • 🔄 100% API compatible with feedparser 6.x
  • Tolerant parsing with bozo flag for malformed feeds
  • 📦 Zero dependencies (pure Rust + PyO3)
  • 🎯 Supports all formats: RSS 0.9x/1.0/2.0, Atom 0.3/1.0, JSON Feed 1.0/1.1
  • 🎙️ Podcast metadata: iTunes tags, Podcast 2.0 namespace
  • 🛡️ DoS protection: Built-in resource limits

Installation

pip install feedparser-rs

Usage

Same API as feedparser:

import feedparser_rs

# From string
d = feedparser_rs.parse('<rss>...</rss>')

# From bytes
d = feedparser_rs.parse(b'<rss>...</rss>')

# From file
with open('feed.xml', 'rb') as f:
    d = feedparser_rs.parse(f.read())

# Access data (feedparser-compatible)
print(d.feed.title)
print(d.version)  # "rss20", "atom10", etc.
print(d.bozo)     # True if parsing errors occurred

for entry in d.entries:
    print(entry.title)
    print(entry.published_parsed)  # time.struct_time

Migration from feedparser

No code changes needed:

# Before
import feedparser
d = feedparser.parse(feed_url_or_content)

# After - just change the import!
import feedparser_rs as feedparser
d = feedparser.parse(feed_url_or_content)

Or use it directly:

import feedparser_rs
d = feedparser_rs.parse(feed_content)

Performance

Benchmark parsing 1000-entry RSS feed (10 iterations):

Library Time Speedup
feedparser 6.0.11 2.45s 1x
feedparser-rs 0.1.0 0.12s 20x

Advanced Usage

Custom Resource Limits

Protect against DoS attacks from malicious feeds:

import feedparser_rs

limits = feedparser_rs.ParserLimits(
    max_feed_size_bytes=50_000_000,  # 50 MB
    max_entries=5_000,
    max_authors=20,              # Max authors per feed/entry
    max_links_per_entry=50,      # Max links per entry
)

d = feedparser_rs.parse_with_limits(feed_data, limits)

Format Detection

Quickly detect feed format without full parsing:

import feedparser_rs

version = feedparser_rs.detect_format(feed_data)
print(version)  # "rss20", "atom10", "json11", etc.

Podcast Support

Access iTunes and Podcast 2.0 metadata:

import feedparser_rs

d = feedparser_rs.parse(podcast_feed)

# iTunes metadata
if d.feed.itunes:
    print(d.feed.itunes.author)
    print(d.feed.itunes.categories)
    print(d.feed.itunes.explicit)

# Episode metadata
for entry in d.entries:
    if entry.itunes:
        print(f"S{entry.itunes.season}E{entry.itunes.episode}")
        print(f"Duration: {entry.itunes.duration}s")

# Podcast 2.0
if d.feed.podcast:
    for person in d.feed.podcast.persons:
        print(f"{person.name} ({person.role})")

API Reference

Main Functions

  • parse(source) - Parse feed from bytes, str, or file
  • parse_with_limits(source, limits) - Parse with custom resource limits
  • detect_format(source) - Detect feed format

Classes

  • FeedParserDict - Parsed feed result

    • .feed - Feed metadata
    • .entries - List of entries
    • .bozo - True if parsing errors occurred
    • .bozo_exception - Error description
    • .version - Feed version string
    • .encoding - Character encoding
    • .namespaces - XML namespaces
  • ParserLimits - Resource limits configuration

Feed Metadata

  • title, subtitle, link - Basic metadata
  • updated_parsed - Update date as time.struct_time
  • authors, contributors - Person lists
  • image, icon, logo - Feed images
  • itunes - iTunes podcast metadata
  • podcast - Podcast 2.0 metadata

Entry Metadata

  • title, summary, content - Entry text
  • link, links - Entry URLs
  • published_parsed, updated_parsed - Dates as time.struct_time
  • authors, contributors - Person lists
  • enclosures - Media attachments
  • itunes - Episode metadata

Compatibility

This library aims for 100% API compatibility with feedparser 6.x. All field names, data structures, and behaviors match feedparser.

Key differences:

  • URL fetching not implemented yet - Use requests.get(url).content
  • Performance - 10-100x faster
  • Error handling - Same tolerant parsing with bozo flag

Requirements

  • Python >= 3.9
  • No runtime dependencies (Rust extension module)

Development

Build from source:

git clone https://github.com/rabax/feedparser-rs
cd feedparser-rs/crates/feedparser-rs-py
pip install maturin
maturin develop

Run tests:

pip install pytest
pytest tests/

License

MIT OR Apache-2.0

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

feedparser_rs-0.1.0.tar.gz (94.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

feedparser_rs-0.1.0-cp314-cp314-macosx_11_0_arm64.whl (1.3 MB view details)

Uploaded CPython 3.14macOS 11.0+ ARM64

File details

Details for the file feedparser_rs-0.1.0.tar.gz.

File metadata

  • Download URL: feedparser_rs-0.1.0.tar.gz
  • Upload date:
  • Size: 94.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.10.2

File hashes

Hashes for feedparser_rs-0.1.0.tar.gz
Algorithm Hash digest
SHA256 da33de34a3a0705493dd34bd67a4b22db2d78de3678d91c685fe607236e76dbe
MD5 fa4061e22592f3915946660032dc01c9
BLAKE2b-256 8e5e284f07865ae5207a8740a07f3e5706f3e92b75f080c31a772b96da8c9c35

See more details on using hashes here.

File details

Details for the file feedparser_rs-0.1.0-cp314-cp314-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for feedparser_rs-0.1.0-cp314-cp314-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 f88648ed43f9be1a6ab87a748bc2220130e5cc887310d2883455658cec3b5f5f
MD5 da92ead1773c9e54a587bab38cd1f001
BLAKE2b-256 acc6f6ed6eac35ce90f93ec908dd55afc2ed17d9d77a636a5360dd719123e8e9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page