Explore Web Pages - Scrapers and Crawlers
Project description
WebXplore (v1.0.3)
WebXplore offers multitude of tools for web scraping, crawling and performing computations on scraped information to determine sentiment values or tone of the author.
This package helps in retrieving information from these sources:
-
Google Search: Get links from any google search query.
-
Website Text: Use an intelligent parser to strip all the HTML pages from webpage contents.
-
Twitter: Given a word or phrase, get related tweets.
-
Reddit: Get the hottest posts given the subreddit and a key phrase.
-
NewsAPI: Retrieve News Articles given topic or phrase.
Installation
$ pip install webxplore
or clone the repository.
$ git clone https://github.com/arnavn101/WebXplore.git
Getting Started
Here are steps for using webxplore.
1. Get Links from Google Search
from webxplore import WebSearcher
searchQuery = WebSearcher.SearchWeb('Artificial Intelligence', 5)
print(searchQuery.returnListLinks())
2. Scrape a Website
from webxplore import WebScraper
webScraper = WebScraper.ScrapeWebsite('https://en.wikipedia.org/wiki/Artificial_intelligence')
print(webScraper.return_article())
3. Get Sentiments from Text
from webxplore.utils import SentimentAnalyzer
sentimentAnalyzer = SentimentAnalyzer.RetrieveSentiments('This is a good situation.')
print(sentimentAnalyzer.returnFinalSentiment())
4. Get Summary of the Text
from webxplore.utils import TextSummarizer
textSummarizer = TextSummarizer.SummarizeText('He feels very scared. He wants to protect himself.', 1)
print(textSummarizer.returnFinalSummary())
5. Get Tone of the Text (for each sentence)
from webxplore.utils import ToneAnalyzer
textTone = ToneAnalyzer.ToneAnalysis('Laugh and the world laughs with you.' +
'Weep and you weep alone.', "watsonApiKey")
print(textTone.returnTone())
6. Use the news api to get the latest articles
from webxplore.searchBeyond import SearchNews
newsArticles = SearchNews.RetrieveNewsArticle('Politics', 5, 'newsApiKey')
print(newsArticles.return_articleSentences())
7. Get Posts from a SubReddit
from webxplore.searchBeyond import SearchReddit
redditPosts = SearchReddit.CrawlSubReddit('stocks', 'amazon', 10, 'RedditClientId',
'RedditClientSecret', 'RedditUserAgent')
print(redditPosts.return_listSentences())
8. Get Tweets that have a key word
from webxplore.searchBeyond import SearchTwitter
retrieveTweets = SearchTwitter.CrawlTwitter('tesla', 10, 'TwitterConsumerKey', 'TwitterConsumerSecret',
'TwitterAccountKey', 'TwitterAccountSecret')
print(retrieveTweets.return_tweets())
Contributions
Anyone is welcome to add any contribution to this repository. All good changes are welcome. Please create a pull request and ensure that it passes all the CI tests.
License
MIT License Copyright (c) 2020, Arnav Nidumolu
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for WebXplore-1.0.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0a4aa32118babd916a396d60fc7505742f6b54c0372bf3be421f24026eff490a |
|
MD5 | 9602d0c1b8988b495f5575333b8e6cf5 |
|
BLAKE2b-256 | 4a81ef4e0bf51b2b01da3f2165003aa569c9843fc9a20a71a92733c7a8471293 |