Skip to main content

A Python Package which helps to scrape news details

Project description

PyPI Version Coverage Status License Documentation Status

news_fetch

news_fetch scrape all the news related attributes with helps Google Search and Newspaper3K which reduce the NaN or '' or [] or None values while scraping.

Source Link
PyPI: https://pypi.org/project/news_fetch/
Repository: https://santhoshse7en.github.io/news_fetch/
Documentation: https://santhoshse7en.github.io/news_fetch_doc/

Dependencies

  • beautifulsoup4
  • selenium
  • chromedriver-binary
  • fake_useragent
  • pandas
  • nltk
  • pattern

Dependencies Installation

Use the package manager pip to install following

pip install -r requirements.txt

Usage

Download it by clicking the green download button here on Github. To extract URLs from targeted website call google_crawler function, you only need to parse argument of keyword and newspaper website.

>>> from news_fetch.news import google_search
>>> google = google_search('Alcoholics Anonymous', 'https://timesofindia.indiatimes.com/')

Directory of google search results urls

google

To scrape the all news details call news_crawler function

>>> from news_fetch.news import newspaper
>>> news = newspaper('https://www.bbc.co.uk/news/world-48810070')

Directory of news_crawler

news

>>> news.headline

'g20 summit: trump and xi agree to restart us china trade talks'

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

news_fetch-0.0.3.tar.gz (4.6 kB view details)

Uploaded Source

File details

Details for the file news_fetch-0.0.3.tar.gz.

File metadata

  • Download URL: news_fetch-0.0.3.tar.gz
  • Upload date:
  • Size: 4.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3

File hashes

Hashes for news_fetch-0.0.3.tar.gz
Algorithm Hash digest
SHA256 0df638b23a149becb87a237c6704aa6b61f2fa9a247a94f64cc6fb8df863eb81
MD5 bd1445331ff39f6909e4c09a86d91aeb
BLAKE2b-256 3c194488239b7258a3f924b0f8bceaeac80d6e01405b8075c0029e23773b2b97

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page