Skip to main content

A tool to download and export RSS feed entries as PDFs

Project description

PDF RSS Downloader

This Python script downloads and exports RSS feed entries into PDF files. It checks if the entries have already been exported by referencing a downloaded.dat file to avoid duplicates. The script handles multiple date formats and ensures that the generated filenames are compatible with the filesystem.

Features

  • RSS Feed Parsing: Parses RSS feeds from a list of URLs.
  • PDF Export: Converts RSS entries (including title, link, publication date, and description) into PDF files.
  • Duplicate Check: Keeps track of downloaded entries using downloaded.dat to prevent duplicate downloads.
  • Filename Sanitization: Ensures filenames are safe and do not exceed OS limits by truncating and sanitizing them.
  • Multiple Date Formats: Supports parsing of multiple date formats commonly found in RSS feeds.
  • UTF-8 Character Handling: Handles special characters in RSS entries to avoid encoding issues.

Requirements

  • Python 3.7 or higher
  • fpdf library

You can install the required dependencies using pip:

pip install fpdf feedparser

Usage

  1. Clone the repository:
git clone https://github.com/yourusername/pdf-rss-downloader.git
cd pdf-rss-downloader
  1. Prepare the RSS Feeds:

    • Create a rss_feeds.txt file in the root directory of the project.
    • Add the RSS feed URLs, one per line. Example:
https://example.com/rss
https://anotherexample.com/feed
  1. Run the Script:
python app.py
  1. Output:
  • PDFs will be saved in the downloaded_pdfs directory, organized by feed title and year of publication.
  1. Check Duplicates:
  • The script logs downloaded entries in downloaded.dat to avoid processing the same entry multiple times.

Customization

  • Filename Length: - The sanitize_filename function truncates filenames to a default of 100 characters. You can modify this value in the script if needed.
  • Adding New Date Formats: - You can add more date formats to the parse_published_date function if your RSS feed uses a format not covered by the script.

Contributing

Feel free to submit issues or pull requests. Contributions are welcome!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdf-rss-dl-0.0.1.tar.gz (4.6 kB view details)

Uploaded Source

Built Distribution

pdf_rss_dl-0.0.1-py3-none-any.whl (4.6 kB view details)

Uploaded Python 3

File details

Details for the file pdf-rss-dl-0.0.1.tar.gz.

File metadata

  • Download URL: pdf-rss-dl-0.0.1.tar.gz
  • Upload date:
  • Size: 4.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.3

File hashes

Hashes for pdf-rss-dl-0.0.1.tar.gz
Algorithm Hash digest
SHA256 29fa08bb395bfc76eb03f3ba07cbe3b5f493bf36130f6c2aa17a5f446eebb8d3
MD5 7a0ee9b406e6f57d3b199bb9fd4a170f
BLAKE2b-256 2a31d1ac942ce3ea6ec342dab30e3b115546253f624dc37f669377db6ece9b1a

See more details on using hashes here.

File details

Details for the file pdf_rss_dl-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: pdf_rss_dl-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 4.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.3

File hashes

Hashes for pdf_rss_dl-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e9c5e7142fbb1e9c797d04e7ed0026cb47bb54871b189cbea72c41efe687214c
MD5 9acacb53c11c82f82ab47cc52b7b8d45
BLAKE2b-256 6bd762512276bd4d09049222c563da73e204e7a4dc43c99f29e7fe912449dc01

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page