A Python Package which helps to scrape news details
Project description
news_fetch
news_fetch scrape all the news related attributes with helps Google Search and Newspaper3K which reduce the NaN or '' or [] or None values while scraping.
Source | Link |
---|---|
PyPI: | https://pypi.org/project/news_fetch/ |
Repository: | https://santhoshse7en.github.io/news_fetch/ |
Documentation: | https://santhoshse7en.github.io/news_fetch_doc/ |
Dependencies
- beautifulsoup4
- selenium
- chromedriver-binary
- fake_useragent
- pandas
- nltk
- pattern
Dependencies Installation
Use the package manager pip to install following
pip install -r requirements.txt
Usage
Download it by clicking the green download button here on Github. To extract URLs from targeted website call google_crawler function, you only need to parse argument of keyword and newspaper website.
>>> from news_fetch.news import google_search
>>> google = google_search('Alcoholics Anonymous', 'https://timesofindia.indiatimes.com/')
Directory of google search results urls
To scrape the all news details call news_crawler function
>>> from news_fetch.news import newspaper
>>> news = newspaper('https://www.bbc.co.uk/news/world-48810070')
Directory of news_crawler
>>> news.headline
'g20 summit: trump and xi agree to restart us china trade talks'
Contributing
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Please make sure to update tests as appropriate.
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file news_fetch-0.0.3.tar.gz
.
File metadata
- Download URL: news_fetch-0.0.3.tar.gz
- Upload date:
- Size: 4.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0df638b23a149becb87a237c6704aa6b61f2fa9a247a94f64cc6fb8df863eb81 |
|
MD5 | bd1445331ff39f6909e4c09a86d91aeb |
|
BLAKE2b-256 | 3c194488239b7258a3f924b0f8bceaeac80d6e01405b8075c0029e23773b2b97 |