A light weight crawler which gives search results in HTML form or in Dictionary form, given urls and keywords.
Project description
CrawlerFriend
A light weight Web Crawler that supports Python 2.7 which gives search results in HTML form or in Dictionary form given URLs and Keywords. If you regularly visit a few websites and look for a few keywords (For example, looking for certain team/players in sports website, looking for a bunch of topics on a website...), then this python package (which is a friend of you and a crawler by nature) will automate the task for you and return the result in a HTML file in your web browser.
Installation
pip install CrawlerFriend
Start the Crawler
Import Module
import CrawlerFriend
Start the Crawler
How to start the Crawler? Crawler takes 4 Arguments:
- urls (List of URLs)
- keywords (List of Keywords)
- max_link_limit [Optional] (Maximum no. of link that will be visited, By Default its value is 50)
- tags [Optional] (The HTML tags that the Crawler will search for,By Default its value is ['title', 'h1', 'h2', 'h3'])
crawler = CrawlerFriend.Crawler(["https://Website1.com/", "http://Website2.com/"], ["Keyword1", "Keyword2"])
crawler.crawl()
Get Search Results
There are several ways to get the search results(all or for a keyword) in HTML or in Dictionary form
All Result in HTML
crawler.get_result_in_html()
All Result in Dictionary
result_dict = crawler.get_result()
Result of a Keyword in HTML
crawler.get_result_of_keyword_in_html()
Result of a Keyword in Dictionary
result_dict = crawler.get_result_of_keyword('keyword1')
Specify Max Link Limit
CrawlerFriend uses 50 as max_link_limit by default for searching. But users can use their own max_link_limit as well like this:
crawler = CrawlerFriend.Crawler(["https://Website1.com/", "http://Website2.com/"], ["Keyword1", "Keyword2"], max_link_limit=200)
crawler.crawl()
Specify tags
CrawlerFriend uses four HTML tags 'title', 'h1', 'h2', 'h3' by default for searching. But users can use their own tags as well like this:
crawler = CrawlerFriend.Crawler(["https://Website1.com/", "http://Website2.com/"], ["Keyword1","Keyword2"], tags=['p','h4'])
crawler.crawl()
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for CrawlerFriend-1.0.8-py2-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8e5fbe7678fe0b79c4491f78af533535b9e8ccb9097321f3c34a59b1ceb4179d |
|
MD5 | 653a0a757785844b9bdc9ed6d23d0356 |
|
BLAKE2b-256 | 00425318d23ccf517312c3ed6d7e44fa285e18235e1f383ca1e337bdd4ee9109 |