Skip to main content

news-fetch is an open source easy-to-use news extractor and basic nlp (cleaning_text, keywords, summary) comes handy that just works

Project description

PyPI Version Coverage Status License Documentation Status

news-fetch

news-fetch was built on top of news-please - Felix Hamborg and Newspaper3K - Lucas (欧阳象) Ou-Yang 'Thank You' both you without them it will be very hard to extract online newspaper. This package consist of both features provided my Felix's work and Lucas' work

news-fetch is an open source, easy-to-use news crawler that extracts structured information from almost any news website. . I built this to reduce most of NaN or '' or [] or 'None' values while scraping for some newspapers. Platform-independent and written in Python 3. This package can be very easily used by programmers and developers to provide access to the news data to their programs.

Source Link
PyPI: https://pypi.org/project/news-fetch/
Repository: https://santhoshse7en.github.io/news-fetch/
Documentation: https://santhoshse7en.github.io/news-fetch_doc/

Dependencies

  • news-please
  • newspaper3k
  • beautifulsoup4
  • fake_useragent
  • selenium
  • chromedriver-binary
  • fake_useragent
  • spacy
  • pandas

Dependencies Installation

Use the package manager pip to install following

pip install -r requirements.txt

Usage

Download it by clicking the green download button here on Github. To extract URLs from targeted website call google_search function, you only need to parse argument of keyword and newspaper link.

>>> from newsfetch.news import google_search
>>> google = google_search('Alcoholics Anonymous', 'https://timesofindia.indiatimes.com/')

Directory of google search results urls

google

To scrape the all news details call newspaper function

>>> from newsfetch.news import newspaper
>>> news = newspaper('https://www.bbc.co.uk/news/world-48810070')

Directory of news

newsdir

>>> news.headline

'g20 summit: trump and xi agree to restart us china trade talks'

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

news-fetch-0.2.2.tar.gz (6.7 kB view details)

Uploaded Source

File details

Details for the file news-fetch-0.2.2.tar.gz.

File metadata

  • Download URL: news-fetch-0.2.2.tar.gz
  • Upload date:
  • Size: 6.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3

File hashes

Hashes for news-fetch-0.2.2.tar.gz
Algorithm Hash digest
SHA256 280923ce27be10bc0ff6e252b1165bf21de9432bb9a30ccd84e54680f02e8d6b
MD5 a7bd4759326f5fb48c984f1193299ba7
BLAKE2b-256 12546704f0025cddcf7a5e6462edd8d0db486eadc5f43cb18f45ec6eb7fa5c28

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page