Skip to main content

No project description provided

Project description

CnbcNews

PyPI version

CnbcNews is an open-source, easy-to-use news crawler that extracts structured information from the CNBC news website for machine learning purposes. It can recursively follow internal hyperlinks and read RSS feeds to fetch the most recent articles in any given field. You only need to provide the desired field ('technology', 'politics', 'business', 'markets', 'investing') of the news website to crawl it completely.

Extracted information

CnbcNews extracts the following attributes from Cnbc news articles.

  • article headline
  • article body (main text)
  • article's author name
  • publication date
  • label

Features

  • works out of the box: install with pip, add the desired field of your articles, run :-)
  • run CnbcNews conveniently using its CLI mode

Modes and use cases

CnbcNews supports two main use cases, which are explained in more detail in the following.

CLI mode

  • stores extracted results in csv files in your own storage
  • simple but extensive configuration (if you want to tweak the results)
  • revisions: crawl articles multiple times and track changes

Library mode

  • crawl and extract information given a list of article URLs
  • to use CnbcNews within your own Python code

Getting started

It's super easy.

Installation

$ pip3 install CnbcNews

Use within your own code (as a library)

You can access the core functionality of CnbcNews, i.e. extraction of semi-structured information from one or more news articles, in your own code by using CnbcNews in library mode.

from CnbcNews import getArticles

getArticles(field="investing", number=50, dropna=True)

If you want to crawl multiple fields at a time, optionally with a timeout in seconds and number of articles for each field

CnbcNews.from_fields([field1, field2, ...], number=10, timeout=6)

Run the crawler (via the CLI)

$ CnbcNews-getArticles field [number] [dropna]

CnbcNews will then start crawling a few articles and The results are stored by default in CSV file.

License

Copyright 2023-2024 Ahmed Bendrioua

Project details


Release history Release notifications | RSS feed

This version

5.7

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

CnbcNews-5.7.tar.gz (4.1 kB view details)

Uploaded Source

Built Distribution

CnbcNews-5.7-py3-none-any.whl (4.5 kB view details)

Uploaded Python 3

File details

Details for the file CnbcNews-5.7.tar.gz.

File metadata

  • Download URL: CnbcNews-5.7.tar.gz
  • Upload date:
  • Size: 4.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.4

File hashes

Hashes for CnbcNews-5.7.tar.gz
Algorithm Hash digest
SHA256 7c0ebe154f0624fd65c18de761426c1fb958546f60a83024cbb5e115df097f93
MD5 da3e3de0cf59435bd51c3faadebf52ef
BLAKE2b-256 d4fb163c34568c15928b9a02d5a21d84b8ac8f3c1a2915813a881c964471bad5

See more details on using hashes here.

File details

Details for the file CnbcNews-5.7-py3-none-any.whl.

File metadata

  • Download URL: CnbcNews-5.7-py3-none-any.whl
  • Upload date:
  • Size: 4.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.4

File hashes

Hashes for CnbcNews-5.7-py3-none-any.whl
Algorithm Hash digest
SHA256 5b0043bc57facbd226aad9817b8b62fb3ea426fee1d8f83c95af9e0ca50fec3d
MD5 9da352ffcbf73178561581b01b4920c1
BLAKE2b-256 6387779c8c77b55d13b2a0004ba29aa0d17180b82cb70e4011861c9fa6fa933e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page