news-fetch is an open-source, easy-to-use news extractor with basic NLP features (cleaning text, keywords, summary) that just works.

These details have not been verified by PyPI

Project links

Homepage

Project description

news-fetch

news-fetch is an open-source, easy-to-use news crawler that extracts structured information from almost any news website. It can recursively follow internal hyperlinks and read RSS feeds to fetch both recent and archived articles. You only need to provide the root URL of the news website to crawl it completely. News-fetch combines the power of multiple state-of-the-art libraries and tools, including news-please by Felix Hamborg and Newspaper3K by Lucas (æ¬§é˜³è±¡) Ou-Yang. This package leverages features from both of these works.

I built this tool to minimize NaN or empty values when scraping data from various news websites. It's platform-independent and written in Python 3, making it easy for programmers and developers to access news data for their applications.

Source	Link
PyPI:	https://pypi.org/project/news-fetch/
Repository:	https://santhoshse7en.github.io/news-fetch/
Documentation:	https://santhoshse7en.github.io/news-fetch_doc/ (Not Yet Created!)

Dependencies

Extracted Information

news-fetch extracts the following attributes from news articles. You can also check out an example JSON file generated by news-please.

Headline
Author(s)
Publication date
Publication
Category
Source domain
Article content
Summary
Keywords
URL
Language

Dependency Installation

Use the package manager pip to install the required dependencies:

pip install -r requirements.txt

Usage

You can download it by clicking the green download button on Github.

To scrape all the news details, use the newspaper function:

from newsfetch.news import Newspaper

news = Newspaper(url='https://www.thehindu.com/news/cities/Madurai/aa-plays-a-pivotal-role-in-helping-people-escape-from-the-grip-of-alcoholism/article67716206.ece')
print(news.headline)
# Output: 'AA plays a pivotal role in helping people escape from the grip of alcoholism'

To extract URLs from a targeted website, call the GoogleSearchNewsURLExtractor by providing the keyword and newspaper link as arguments:

from newsfetch.google import GoogleSearchNewsURLExtractor

google = GoogleSearchNewsURLExtractor(keyword='Alcoholics Anonymous', news_domain='https://timesofindia.indiatimes.com/')
print(google.urls)
"""
['https://timesofindia.indiatimes.com/city/pune/pune-takes-a-stand-against-alcoholism-experts-collaborate-with-alcoholics-anonymous/articleshow/114438466.cms', 
'https://timesofindia.indiatimes.com/city/mumbai/we-have-lost-jobs-homes-alcoholics-anonymous/articleshow/96824383.cms', 
'https://timesofindia.indiatimes.com/city/gurgaon/gurgaons-alcoholics-open-up-about-their-road-to-recovery/articleshow/45080744.cms', 
'https://timesofindia.indiatimes.com/city/goa/alcoholism-is-illness-not-issue-of-weak-willpower-say-experts/articleshow/105320008.cms', 
'https://timesofindia.indiatimes.com/city/bhopal/alcoholism-is-an-illness-bhopal-aa-silver-jubilee-celebration/articleshow/106849014.cms', 
'https://timesofindia.indiatimes.com/city/ahmedabad/alcoholics-anonymous-switches-to-online-sessions/articleshow/76144639.cms', 
'https://timesofindia.indiatimes.com/city/kochi/keralites-trying-to-kick-alcoholism-alcoholics-anonymous/articleshow/13977818.cms', 
'https://timesofindia.indiatimes.com/city/chandigarh/alcoholics-anonymous-turned-their-lives-around/articleshow/18239.cms', 
'https://timesofindia.indiatimes.com/city/mumbai/like-air-india-flyer-alcoholics-anonymous-members-reap-whirlwind-of-job-loss-broken-homes/articleshow/96820403.cms', 
'https://timesofindia.indiatimes.com/city/nagpur/alcoholics-anonymous-meet-promotes-one-day-at-a-time/articleshow/50538092.cms']
"""

Contributing

Pull requests are welcome! For major changes, please open an issue first to discuss what you would like to change.

Make sure to update tests as appropriate.

License

This project is licensed under the MIT License.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.3.0

Nov 3, 2024

0.2.9

Nov 3, 2024

0.2.8

Jan 29, 2021

0.2.7

Jan 29, 2021

0.2.6

Jan 29, 2021

0.2.5

Jul 24, 2020

0.2.4

Jul 24, 2020

0.2.3

May 29, 2020

0.2.2

Jul 23, 2019

0.2.1

Jul 23, 2019

0.2.0

Jul 23, 2019

0.1.9

Jul 23, 2019

0.1.8

Jul 14, 2019

0.1.7

Jul 3, 2019

0.1.6

Jul 3, 2019

0.1.5

Jun 30, 2019

0.1.4

Jun 30, 2019

0.1.3

Jun 30, 2019

0.1.2

Jun 30, 2019

0.1.1

Jun 30, 2019

0.1.0

Jun 29, 2019

0.0.9

Jun 29, 2019

0.0.8

Jun 29, 2019

0.0.7

Jun 29, 2019

0.0.6

Jun 29, 2019

0.0.5

Jun 29, 2019

0.0.4

Jun 29, 2019

0.0.3

Jun 29, 2019

0.0.2

Jun 29, 2019

0.0.1

Jun 29, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

news_fetch-0.3.0.tar.gz (10.1 kB view details)

Uploaded Nov 3, 2024 Source

File details

Details for the file news_fetch-0.3.0.tar.gz.

File metadata

Download URL: news_fetch-0.3.0.tar.gz
Upload date: Nov 3, 2024
Size: 10.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for news_fetch-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`3fca9fbcb80c8bd7d5c4db4700a5055f6327dd2e29abee93843e7d89e34d4b26`
MD5	`caadf88242bbb2cfaa2712909827420d`
BLAKE2b-256	`93166ab3205649a70b6faa20ac65f3b16b667629c056dbea314daeb828118374`

See more details on using hashes here.

news-fetch 0.3.0

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

news-fetch

Dependencies

Extracted Information

Dependency Installation

Usage

Contributing

License

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes