A tool to download and export RSS feed entries as PDFs
Project description
PDF RSS Downloader
This Python script downloads and exports RSS feed entries into PDF files. It checks if the entries have already been exported by referencing a downloaded.dat
file to avoid duplicates. The script handles multiple date formats and ensures that the generated filenames are compatible with the filesystem.
Features
- RSS Feed Parsing: Parses RSS feeds from a list of URLs.
- PDF Export: Converts RSS entries (including title, link, publication date, and description) into PDF files.
- Duplicate Check: Keeps track of downloaded entries using
downloaded.dat
to prevent duplicate downloads. - Filename Sanitization: Ensures filenames are safe and do not exceed OS limits by truncating and sanitizing them.
- Multiple Date Formats: Supports parsing of multiple date formats commonly found in RSS feeds.
- UTF-8 Character Handling: Handles special characters in RSS entries to avoid encoding issues.
Requirements
- Python 3.7 or higher
- fpdf library
You can install the required dependencies using pip:
pip install fpdf feedparser
Usage
- Clone the repository:
git clone https://github.com/yourusername/pdf-rss-downloader.git
cd pdf-rss-downloader
-
Prepare the RSS Feeds:
- Create a rss_feeds.txt file in the root directory of the project.
- Add the RSS feed URLs, one per line. Example:
https://example.com/rss
https://anotherexample.com/feed
- Run the Script:
python app.py
- Output:
- PDFs will be saved in the downloaded_pdfs directory, organized by feed title and year of publication.
- Check Duplicates:
- The script logs downloaded entries in downloaded.dat to avoid processing the same entry multiple times.
Customization
- Filename Length: - The sanitize_filename function truncates filenames to a default of 100 characters. You can modify this value in the script if needed.
- Adding New Date Formats: - You can add more date formats to the parse_published_date function if your RSS feed uses a format not covered by the script.
Contributing
Feel free to submit issues or pull requests. Contributions are welcome!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pdf-rss-dl-0.0.1.tar.gz
(4.6 kB
view details)
Built Distribution
File details
Details for the file pdf-rss-dl-0.0.1.tar.gz
.
File metadata
- Download URL: pdf-rss-dl-0.0.1.tar.gz
- Upload date:
- Size: 4.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 29fa08bb395bfc76eb03f3ba07cbe3b5f493bf36130f6c2aa17a5f446eebb8d3 |
|
MD5 | 7a0ee9b406e6f57d3b199bb9fd4a170f |
|
BLAKE2b-256 | 2a31d1ac942ce3ea6ec342dab30e3b115546253f624dc37f669377db6ece9b1a |
File details
Details for the file pdf_rss_dl-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: pdf_rss_dl-0.0.1-py3-none-any.whl
- Upload date:
- Size: 4.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e9c5e7142fbb1e9c797d04e7ed0026cb47bb54871b189cbea72c41efe687214c |
|
MD5 | 9acacb53c11c82f82ab47cc52b7b8d45 |
|
BLAKE2b-256 | 6bd762512276bd4d09049222c563da73e204e7a4dc43c99f29e7fe912449dc01 |