news-fetch is an open source easy-to-use news extractor and basic nlp (cleaning_text, keywords, summary) comes handy that just works

These details have not been verified by PyPI

Project links

Homepage

Project description

news-fetch

news-fetch is an open source, easy-to-use news crawler that extracts structured information from almost any news website. It can follow recursively internal hyperlinks and read RSS feeds to fetch both most recent and also old, archived articles. You only need to provide the root URL of the news website to crawl it completely. news-fetch combines the power of multiple state-of-the-art libraries and tools, such as news-please - Felix Hamborg and Newspaper3K - Lucas (æ¬§é˜³è±¡) Ou-Yang. This package consist of both features provided my Felix's work and Lucas' work.

I built this to reduce most of NaN or '' or [] or 'None' values while scraping for some newswesites. Platform-independent and written in Python 3. This package can be very easily used by programmers and developers to provide access to the news data to their programs.

Source	Link
PyPI:	https://pypi.org/project/news-fetch/
Repository:	https://santhoshse7en.github.io/news-fetch/
Documentation:	https://santhoshse7en.github.io/news-fetch_doc/

Dependencies

news-please
newspaper3k
beautifulsoup4
fake_useragent
selenium
chromedriver-binary
fake_useragent
pandas

Extracted information

news-please extracts the following attributes from news articles. Also, have a look at an examplary json file extracted by news-please.

headline
name(s) of author(s)
publication date
publication
category
source_domain
article
summary
keyword
url
language

Dependencies Installation

Use the package manager pip to install following

pip install -r requirements.txt

Usage

Download it by clicking the green download button here on Github. To extract URLs from targeted website call google_search function, you only need to parse argument of keyword and newspaper link.

>>> from newsfetch.news import google_search
>>> google = google_search('Alcoholics Anonymous', 'https://timesofindia.indiatimes.com/')

Directory of google search results urls

google

To scrape the all news details call newspaper function

>>> from newsfetch.news import newspaper
>>> news = newspaper('https://www.bbc.co.uk/news/world-48810070')

Directory of news

newsdir

>>> news.headline

'g20 summit: trump and xi agree to restart us china trade talks'

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

License

MIT

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.3.0

Nov 3, 2024

0.2.9

Nov 3, 2024

0.2.8

Jan 29, 2021

0.2.7

Jan 29, 2021

0.2.6

Jan 29, 2021

This version

0.2.5

Jul 24, 2020

0.2.4

Jul 24, 2020

0.2.3

May 29, 2020

0.2.2

Jul 23, 2019

0.2.1

Jul 23, 2019

0.2.0

Jul 23, 2019

0.1.9

Jul 23, 2019

0.1.8

Jul 14, 2019

0.1.7

Jul 3, 2019

0.1.6

Jul 3, 2019

0.1.5

Jun 30, 2019

0.1.4

Jun 30, 2019

0.1.3

Jun 30, 2019

0.1.2

Jun 30, 2019

0.1.1

Jun 30, 2019

0.1.0

Jun 29, 2019

0.0.9

Jun 29, 2019

0.0.8

Jun 29, 2019

0.0.7

Jun 29, 2019

0.0.6

Jun 29, 2019

0.0.5

Jun 29, 2019

0.0.4

Jun 29, 2019

0.0.3

Jun 29, 2019

0.0.2

Jun 29, 2019

0.0.1

Jun 29, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

news-fetch-0.2.5.tar.gz (6.8 kB view details)

Uploaded Jul 24, 2020 Source

File details

Details for the file news-fetch-0.2.5.tar.gz.

File metadata

Download URL: news-fetch-0.2.5.tar.gz
Upload date: Jul 24, 2020
Size: 6.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.2.0.post20200210 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.6

File hashes

Hashes for news-fetch-0.2.5.tar.gz
Algorithm	Hash digest
SHA256	`131493418a789214d29604a92cf77f312d876e51e5c4fff202ef2d1f069f99f7`
MD5	`f08a1a9ca0900a77664237ad1df8d0cc`
BLAKE2b-256	`cab1427498517a187ad2cc3c2cf3c15a98c85ecbe238f8625b4ce5ace9117a6c`

See more details on using hashes here.

news-fetch 0.2.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

news-fetch

Dependencies

Extracted information

Dependencies Installation

Usage

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes