No project description provided
Project description
CnbcNews
CnbcNews is an open-source, easy-to-use news crawler that extracts structured information from the CNBC news website for machine learning purposes. It can recursively follow internal hyperlinks and read RSS feeds to fetch the most recent articles in any given field. You only need to provide the desired field ('technology', 'politics', 'business', 'markets', 'investing') of the news website to crawl it completely.
Extracted information
CnbcNews extracts the following attributes from Cnbc news articles.
- article headline
- article body (main text)
- article's author name
- publication date
- label
Features
- works out of the box: install with pip, add the desired field of your articles, run :-)
- run CnbcNews conveniently using its CLI mode
Modes and use cases
CnbcNews supports two main use cases, which are explained in more detail in the following.
CLI mode
- stores extracted results in csv files in your own storage
- simple but extensive configuration (if you want to tweak the results)
- revisions: crawl articles multiple times and track changes
Library mode
- crawl and extract information given a list of article URLs
- to use CnbcNews within your own Python code
Getting started
It's super easy.
Installation
$ pip3 install CnbcNews
Use within your own code (as a library)
You can access the core functionality of CnbcNews, i.e. extraction of semi-structured information from one or more news articles, in your own code by using CnbcNews in library mode.
from CnbcNews import getArticles
getArticles(field="investing", number=50, dropna=True)
If you want to crawl multiple fields at a time, optionally with a timeout in seconds and number of articles for each field
CnbcNews.from_fields([field1, field2, ...], number=10, timeout=6)
Run the crawler (via the CLI)
$ CnbcNews-getArticles field [number] [dropna]
CnbcNews will then start crawling a few articles and The results are stored by default in CSV file.
License
Copyright 2023-2024 Ahmed Bendrioua
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file CnbcNews-5.7.tar.gz
.
File metadata
- Download URL: CnbcNews-5.7.tar.gz
- Upload date:
- Size: 4.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.11.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7c0ebe154f0624fd65c18de761426c1fb958546f60a83024cbb5e115df097f93 |
|
MD5 | da3e3de0cf59435bd51c3faadebf52ef |
|
BLAKE2b-256 | d4fb163c34568c15928b9a02d5a21d84b8ac8f3c1a2915813a881c964471bad5 |
File details
Details for the file CnbcNews-5.7-py3-none-any.whl
.
File metadata
- Download URL: CnbcNews-5.7-py3-none-any.whl
- Upload date:
- Size: 4.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.11.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5b0043bc57facbd226aad9817b8b62fb3ea426fee1d8f83c95af9e0ca50fec3d |
|
MD5 | 9da352ffcbf73178561581b01b4920c1 |
|
BLAKE2b-256 | 6387779c8c77b55d13b2a0004ba29aa0d17180b82cb70e4011861c9fa6fa933e |