A light weight crawler which gives search results in HTML form or in Dictionary form, given urls and keywords.
Project description
## CrawlerFriend
A light weight **Web Crawler** that supports **Python 2.7** which gives search results in HTML form or in
Dictionary form given URLs and Keywords. If you regularly visit a few websites and look for a few keywords
then this python package will automate the task for you and
return the result in a HTML file in your web browser.
### Installation
```
pip install CrawlerFriend
```
### How to use?
#### All Result in HTML
```
import CrawlerFriend
urls = ["http://www.goal.com/","http://www.skysports.com/football","https://www.bbc.com/sport/football"]
keywords = ["Ronaldo","Liverpool","Salah","Real Madrid","Arsenal","Chelsea","Man United","Man City"]
crawler = CrawlerFriend.Crawler(urls, keywords)
crawler.crawl()
crawler.get_result_in_html()
```
#### All Result in Dictionary
```
result_dict = crawler.get_result()
```
#### Changing Default Arguments
CrawlerFriend uses four HTML tags 'title', 'h1', 'h2', 'h3' and max_link_limit = 50 by default for searching.
But it can be changed by passing arguments to the constructor:
```
crawler = CrawlerFriend.Crawler(urls, keywords, max_link_limit=200, tags=['p','h4'])
crawler.crawl()
```
A light weight **Web Crawler** that supports **Python 2.7** which gives search results in HTML form or in
Dictionary form given URLs and Keywords. If you regularly visit a few websites and look for a few keywords
then this python package will automate the task for you and
return the result in a HTML file in your web browser.
### Installation
```
pip install CrawlerFriend
```
### How to use?
#### All Result in HTML
```
import CrawlerFriend
urls = ["http://www.goal.com/","http://www.skysports.com/football","https://www.bbc.com/sport/football"]
keywords = ["Ronaldo","Liverpool","Salah","Real Madrid","Arsenal","Chelsea","Man United","Man City"]
crawler = CrawlerFriend.Crawler(urls, keywords)
crawler.crawl()
crawler.get_result_in_html()
```
#### All Result in Dictionary
```
result_dict = crawler.get_result()
```
#### Changing Default Arguments
CrawlerFriend uses four HTML tags 'title', 'h1', 'h2', 'h3' and max_link_limit = 50 by default for searching.
But it can be changed by passing arguments to the constructor:
```
crawler = CrawlerFriend.Crawler(urls, keywords, max_link_limit=200, tags=['p','h4'])
crawler.crawl()
```
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
CrawlerFriend-1.0.10.tar.gz
(4.4 kB
view hashes)
Built Distribution
Close
Hashes for CrawlerFriend-1.0.10-py2-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 379e91f9f3e9fc8691d4c28e0a4a00e4ec9eb17845b80f2d3ae9f340b9040c99 |
|
MD5 | 6a80677d13516c0f77e39c56b10833c9 |
|
BLAKE2b-256 | 511a54b682ac8b969d0e90ea83c790aee2e7b21ead282b1dca0955b9ef852c21 |