Skip to main content

This is a News Library.

Project description

Pypi Publish GitHub release (latest by date) License

ejtraderNS

Programmatically collect normalized news from (almost) any website.

Filter by topic, country, or language.

Installation

pip install ejtraderNS --upgrade

Quick Start

from ejtraderNS import Client

Get the latest news from nytimes.com (we support thousands of news websites, try yourself!) main news feed

api = ejtraderNS(website = 'nytimes.com')
results = api.get_news()

# results.keys()
# 'url', 'topic', 'language', 'country', 'articles'

# Get the articles
articles = results['articles']

first_article_summary = articles[0]['summary']
first_article_title = articles[0]['title']

Get the latest news from nytimes.com politics feed

api = ejtraderNS(website = 'nytimes.com', topic = 'politics')

results = api.get_news()
articles = results['articles']

Some websites support multiple countries, such as investing.com or tradingeconomics.com

In this example, I will demonstrate a website that supports multiple countries, retrieve multiple topics, and convert the data into a pandas dataframe.

import pandas as pd
from ejtraderNS import Client
from datetime import datetime

url = 'investing.com' # or tradingeconomics.com
country = 'GB'
country_topic = ["finance","news","economics"]
dfs = []

for topic in country_topic:
    api = Client(website=url, topic=topic, country=country)
    getdata = api.get_news()
    print(f"topic: {topic}")

    if getdata is None:
        continue

    data = []

    for article in getdata['articles']:
        article_data = {
            'topic': getdata['topic'],
            'author': article['author'],
            'date': article['published_parsed'] if article['published_parsed'] else article['published'],
            'country': getdata['country'],
            'language': getdata['language'],
            'title': article['title'],
            'summary': article.get('summary', article['title'])
        }
        data.append(article_data)

    df = pd.DataFrame(data)

    df['date'] = pd.to_datetime(df['date'].apply(lambda x: datetime(*x[:6]) if isinstance(x, tuple) else x), utc=True, errors='coerce')
    df.set_index('date', inplace=True)
    dfs.append(df)

df = pd.concat(dfs)
df.sort_index(inplace=True)
print(df)

output example:

topic author country language title summary
finance Reuters GB en Italy pushes to limit executive pay in listed state-run firms Italy pushes to limit executive pay in listed state-run firms
economics Reuters GB en UK's Cleverly raises Xinjiang and Taiwan with Chinese vice president UK's Cleverly raises Xinjiang and Taiwan with Chinese vice president
news Reuters GB en Ukraine hails return of 45 Azov fighters, Russia says 3 pilots released Ukraine hails return of 45 Azov fighters, Russia says 3 pilots released

There is a limited set of topic that you might find: 'tech', 'news', 'business', 'science', 'finance', 'food', 'politics', 'economics', 'travel', 'entertainment', 'music', 'sport', 'world'

extras topics only for investing.com

'crypto', 'forex', 'stock', 'commodities', 'central_bank', 'forex_analysis', 'forex_technical', 'forex_fundamental', 'forex_opinion', 'forex_signal', 'bonds_analysis', 'bonds_technical', 'bonds_fundamental', 'bonds_opinion', 'bonds_strategy', 'bonds_government', 'bonds_corporate', 'stock_analysis', 'stock_technical', 'stock_fundamental', 'stock_opinion', 'stock_picks', 'indices_analysis', 'futures_analysis', 'options_analysis', 'commodities_analysis', 'commodities_technical', 'commodities_Fundamental', 'commodities_opinion', 'commodities_strategy', 'commodities_metals', 'commodities_energy', 'commodities_agriculture', 'overview_analysis', 'overview_technical', 'overview_fundamental', 'overview_opinion', 'overview_investing', 'crypto_opinion'

However, not all topics are supported by every newspaper.

How to check which topics are supported by which newspaper:

from ejtraderNS import describe_url

describe = describe_url('nytimes.com')

print(describe['topics'])

Get the list of all news feeds by topic/language/country

If you want to find the full list of supported news websites you can always do so using urls() function

from ejtraderNS import urls

# URLs by TOPIC
politic_urls = urls(topic = 'politics')

# URLs by COUNTRY
american_urls = urls(country = 'US')

# URLs by LANGUAGE
english_urls = urls(language = 'en')

# Combine any from topic, country, language
american_english_politics_urls = urls(country = 'US', topic = 'politics', language = 'en') 

# note some websites do not explicitly declare their language 
# as a result they will be excluded from queries based on language

Documentation

ejtraderNS Class

from ejtraderNS import Client

Client(website, topic = None)

Please take the base form url of a website (without www.,neither https://, nor / at the end of url).

For example: “nytimes”.com, “news.ycombinator.com” or “theverge.com”.


Client.get_news() - Get the latest news from the website of interest.

Allowed topics: tech, news, business, science, finance, food, politics, economics, travel, entertainment, music, sport, world

If no topic is provided, the main feed is returned.

Returns a dictionary of 5 elements:

  1. url - URL of the website
  2. topic - topic of the returned feed
  3. language - language of returned feed
  4. country - country of returned feed
  5. articles - articles of the feed. Feedparser object

Client.get_headlines() - Returns only the headlines


Client.print_headlines(n) - Print top n headlines




describe_url() & urls()

Those functions exist to help you navigate through this package


from ejtraderNS import describe_url

describe_url(website) - Get the main info on the website.

Returns a dictionary of 5 elements:

  1. url - URL of the website
  2. topics - list of all supported topics
  3. language - language of website
  4. country - country of returned feed
  5. main_topic - main topic of a website

from ejtraderNS import urls

urls(topic = None, language = None, country = None) - Get a list of all supported news websites given any combination of topic, language, country

Returns a list of websites that match your combination of topic, language, country

Supported topics: tech, news, business, science, finance, food, politics, economics, travel, entertainment, music, sport, world

Supported countries: US, GB, DE, FR, IN, RU, ES, BR, IT, CA, AU, NL, PL, NZ, PT, RO, UA, JP, AR, IR, IE, PH, IS, ZA, AT, CL, HR, BG, HU, KR, SZ, AE, EG, VE, CO, SE, CZ, ZH, MT, AZ, GR, BE, LU, IL, LT, NI, MY, TR, BM, NO, ME, SA, RS, BA

Supported languages: EL, IT, ZH, EN, RU, CS, RO, FR, JA, DE, PT, ES, AR, HE, UK, PL, NL, TR, VI, KO, TH, ID, HR, DA, BG, NO, SK, FA, ET, SV, BN, GU, MK, PA, HU, SL, FI, LT, MR, HI

Tech/framework used

The package itself is nothing more than a SQLite database with RSS feed endpoints for each website and some basic wrapper of feedparser.

Acknowledgements

I would like to express my gratitude to @kotartemiy for creating the initial project. Their work has been an invaluable starting point for my modifications and improvements.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ejtraderNS-0.0.3.tar.gz (166.4 kB view details)

Uploaded Source

Built Distribution

ejtraderNS-0.0.3-py2.py3-none-any.whl (163.2 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file ejtraderNS-0.0.3.tar.gz.

File metadata

  • Download URL: ejtraderNS-0.0.3.tar.gz
  • Upload date:
  • Size: 166.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for ejtraderNS-0.0.3.tar.gz
Algorithm Hash digest
SHA256 ce0469b5e7e9b7c44d439905223c2f633db16e07555b2b54b88c40ab02007ace
MD5 b7dc8bd8d28f491b7404f36461872e5b
BLAKE2b-256 f681d930b550b98af64484f260be91a486d450bc16fdf416b0a58402ce28bd4f

See more details on using hashes here.

File details

Details for the file ejtraderNS-0.0.3-py2.py3-none-any.whl.

File metadata

  • Download URL: ejtraderNS-0.0.3-py2.py3-none-any.whl
  • Upload date:
  • Size: 163.2 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for ejtraderNS-0.0.3-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 cae62259ec81180d8bf57e89f477a7c71ae5b9aa0bb90f0a76ed95197e84f6f0
MD5 324720e08ef65b6fb514de43bb590659
BLAKE2b-256 566616fdea2cfc0f1c19cda25d3114074f938eb6cdfbefcbc6fbdb8a96f7b77d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page