NewsCollector - Python script for automated collection of most relevant news articles of the day.

These details have not been verified by PyPI

Project links

Development Status
- 5 - Production/Stable
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language
Topic
- Software Development :: Build Tools

Project description

Python NewsCollector

As the internet has grown, the available sources of information at our disposal have equally grown. Nowadays, if you want to update yourself with the most important news of the day, you have a vast variety of news sources to choose from. Since we have that many news sources at our disposal, instead of manually going through all their content...

Couldn't we let automation pick the top news stories from various newspapers for us, and nicely combine them into a newsletter?

This is what the Python NewsCollector can do for us!

For a detailed usage guide, please refer to the official NewsCollector Usage Documentation.

Read more about how the algorithm of NewsCollector works in my Medium article.

Description

The Python NewsCollector lets you define a variety of news sources from which it will pick the most relevant articles and bundle these in a nice HTML-based newsletter.

View the full sample newsletter in PDF format here.

The NewsCollector algorithm scrapes the source links provided and compares the articles it found based on their similarity. If it finds multiple articles from different sources covering similar topics, these will be considered as being relevant articles and will be included in the output newsletter.

Basic Usage

You can run the NewsCollector algorithm as follows:

from newscollector import *

newsletter = NewsCollector()
output = newsletter.create()

This will run the full NewsCollector pipeline by scraping the sources from the sources.json file and outputting an HTML newsletter.

The output object will hold the location path of the generated newsletter, so that you can easily retrieve it programmatically:

output
> 'C:\\Output\\Path\\newsletter.html'

CLI Usage

The NewsCollector can also be run directly via the CLI with the following parameters:

newscollector.py [-h] [-s [SOURCES]] [-n [NEWS_NAME]] [-d [NEWS_DATE]] 
                 [-t [TEMPLATE]] [-o [OUTPUT_FILENAME]] [-a [AUTO_OPEN]]
                 [-r [RETURN_DETAILS]]

Output

The NewsCollector will output an HTML newsletter with the most relevant articles it found while scraping the sources provided.

By default, the output newsletter will be created as an HTML file in the installation directory of your package, saved in the folder rendered under the filename newsletter_YYYY-MM-DD.html, where the date is the respective date the NewsCollector scraped its articles from.

View the full sample newsletter in PDF format here.

To adjust the default settings, please refer to Additional Parameters.

Additional Parameters

You can customize the NewsCollector algorithm with the following optional parameters:

newsletter = NewsCollector(sources="sources.json", news_name="Daily News Update", 
                           news_date=date.today(), template='newsletter.html', 
                           output_filename='default', auto_open=False, 
                           return_details=False)

For a detailed usage guide, please refer to the official NewsCollector Usage Documentation.

Project details

These details have not been verified by PyPI

Project links

Development Status
- 5 - Production/Stable
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language
Topic
- Software Development :: Build Tools

Release history Release notifications | RSS feed

This version

1.0.0

Feb 10, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

py-newscollector-1.0.0.tar.gz (111.9 kB view details)

Uploaded Feb 10, 2023 Source

File details

Details for the file py-newscollector-1.0.0.tar.gz.

File metadata

Download URL: py-newscollector-1.0.0.tar.gz
Upload date: Feb 10, 2023
Size: 111.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.7.4

File hashes

Hashes for py-newscollector-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`7db0891ba4457623dc5bb283d746379d0c83301681d242bef87ecf594fdb953a`
MD5	`51f116db1b845678bc37e001d45a290f`
BLAKE2b-256	`31390d38f7709ca2177799c729106fa922eecaa23c69ae777569d5404615042c`