Feed Tracking and Retrieval Abstraction Interface Layer
Project description
feedtrail
Feed Tracking and Retrieval Abstraction Interface Layer.
feedtrail provides a resilient RSS/Atom parser focused on production-style feeds where XML can be noisy, partially malformed, or inconsistent across publishers.
What It Does
- Parses RSS and Atom feeds with namespace support.
- Normalizes and cleans feed/item text content.
- Converts heterogeneous date formats to UTC ISO strings.
- Extracts structured metadata: title, link, description, summary, author, categories, and primary image.
- Computes a deterministic
request_hashfor parsed payload integrity checks. - Handles malformed XML defensively (entity sanitization, escaped CDATA recovery, trailing content trimming).
Installation
Runtime
Requirements:
- Python
>= 3.10
Install from source:
pip install .
Development
pip install -r requirements_dev.txt
Quick Start
from feedtrail.feed_parser import FeedParser
xml_content = """<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
<channel>
<title>Example Feed</title>
<link>https://example.com</link>
<description>Demo</description>
<item>
<title>Hello</title>
<link>/hello</link>
<pubDate>Wed, 20 Mar 2024 09:00:00 GMT</pubDate>
<description><![CDATA[Post body]]></description>
</item>
</channel>
</rss>"""
parser = FeedParser()
result = parser.parse(xml_content, base_url="https://example.com")
print(result["headers"]["title"])
print(result["items"][0]["link"])
Output Contract
FeedParser.parse(...) returns a dictionary with:
headers: feed-level metadata (title,link,description,updated,language,generator,parent_link,self_link).items: list of normalized entries, each including:titlelinkdescriptionsummarypub_date(ISO-like UTC string, when available)authorimagecategories
request_hash: SHA-256 hash of normalized parsed payload.error: present when parsing fails (itemswill be empty in that case).
Development Workflow
Run tests
make test
Lint and formatting
make lint
Coverage
make coverage
Build Docker test image
make build-test-image
Run tox environments in Docker
make test-all
Project Structure
feedtrail/
feed_parser.py # Core RSS/Atom parser
utils/
date_utils.py # Date parsing and normalization
xml_utils.py # XML sanitation and extraction helpers
tests/
test_feed_parser.py
test_utils_date_utils.py
test_utils_xml_utils.py
Author
- Juan Manuel Cristóbal Moreno (juanmcristobal@gmail.com)
See AUTHORS.md for contributors.
History
0.1.0 (2026-03-25)
- First release.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file feedtrail-0.1.0.tar.gz.
File metadata
- Download URL: feedtrail-0.1.0.tar.gz
- Upload date:
- Size: 16.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bdc21d7538f9a188f020afad505750adacd42306fdc3da909a5e78fef8ceb1c6
|
|
| MD5 |
cd14305bd4ff8f06067473d3a9643724
|
|
| BLAKE2b-256 |
fe63765699687bd92eaae060d6857e546647f59cc29a9810fc3fd98bcd81da57
|
File details
Details for the file feedtrail-0.1.0-py3-none-any.whl.
File metadata
- Download URL: feedtrail-0.1.0-py3-none-any.whl
- Upload date:
- Size: 11.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e7ee8586d64469654c8ecab7881ff6fc7dd34125a5a8765bdd78d0f3cd302abd
|
|
| MD5 |
503d7847e0bcaec40ebe47c2e8f6a829
|
|
| BLAKE2b-256 |
0cc106ba9a9d65c0b0f9b24baad1623a06473bcc8515dcc20eddeff4b5be60ad
|