Skip to main content

A tool to download and export RSS feed entries as PDFs

Project description

PDF RSS Downloader

This Python script downloads and exports RSS feed entries into PDF files. It checks if the entries have already been exported by referencing a downloaded.dat file to avoid duplicates. The script handles multiple date formats and ensures that the generated filenames are compatible with the filesystem.

Features

  • RSS Feed Parsing: Parses RSS feeds from a list of URLs.
  • PDF Export: Converts RSS entries (including title, link, publication date, and description) into PDF files.
  • Duplicate Check: Keeps track of downloaded entries using downloaded.dat to prevent duplicate downloads.
  • Filename Sanitization: Ensures filenames are safe and do not exceed OS limits by truncating and sanitizing them.
  • Multiple Date Formats: Supports parsing of multiple date formats commonly found in RSS feeds.
  • UTF-8 Character Handling: Handles special characters in RSS entries to avoid encoding issues.

Requirements

  • Python 3.7 or higher
  • fpdf library

You can install the required dependencies using pip:

pip install fpdf feedparser

Usage

  1. Clone the repository:
git clone https://github.com/thisisawesome1994/pdf-rss-downloader.git
cd pdf-rss-downloader
  1. Prepare the RSS Feeds:

    • Create a rss_feeds.txt file in the root directory of the project.
    • Add the RSS feed URLs, one per line. Example:
https://example.com/rss
https://anotherexample.com/feed
  1. Run the Script:
python app.py
  1. Output:
  • PDFs will be saved in the downloaded_pdfs directory, organized by feed title and year of publication.
  1. Check Duplicates:
  • The script logs downloaded entries in downloaded.dat to avoid processing the same entry multiple times.

Customization

  • Filename Length: - The sanitize_filename function truncates filenames to a default of 100 characters. You can modify this value in the script if needed.
  • Adding New Date Formats: - You can add more date formats to the parse_published_date function if your RSS feed uses a format not covered by the script.

Contributing

Feel free to submit issues or pull requests. Contributions are welcome!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdf-rss-dl-0.0.2.tar.gz (4.6 kB view details)

Uploaded Source

Built Distribution

pdf_rss_dl-0.0.2-py3-none-any.whl (4.6 kB view details)

Uploaded Python 3

File details

Details for the file pdf-rss-dl-0.0.2.tar.gz.

File metadata

  • Download URL: pdf-rss-dl-0.0.2.tar.gz
  • Upload date:
  • Size: 4.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.3

File hashes

Hashes for pdf-rss-dl-0.0.2.tar.gz
Algorithm Hash digest
SHA256 d4d9e527f75923a46a07fb5b37ed105f9fe417312956c472fe09c90b7c7efefe
MD5 f18ae6ce6aa25206ff412f4b9db3e9ba
BLAKE2b-256 aa73a14e33c0463ac927a36bdfd474ee9b7aa490d846e5558921371048ff295c

See more details on using hashes here.

File details

Details for the file pdf_rss_dl-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: pdf_rss_dl-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 4.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.3

File hashes

Hashes for pdf_rss_dl-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 99c6683f4ac6b666dd831ee1febd137346e648e8ba71a899c157e896db33537b
MD5 a005bcef0a45a4677802be471e8a0599
BLAKE2b-256 7214b53853a253a8551f93fc0ca1e34d7df601505e7f2ff3b8282a4033f7c6fa

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page