Skip to main content

A light weight crawler which gives search results in HTML form or in Dictionary form, given urls and keywords.

Project description

CrawlerFriend

A light weight Web Crawler that supports Python 2.7 which gives search results in HTML form or in Dictionary form given URLs and Keywords. If you regularly visit a few websites and look for a few keywords then this python package will automate the task for you and return the result in a HTML file in your web browser.

Installation

pip install CrawlerFriend

How to use?

All Result in HTML

import CrawlerFriend

urls = ["http://www.goal.com/","http://www.skysports.com/football","https://www.bbc.com/sport/football"]
keywords = ["Ronaldo","Liverpool","Salah","Real Madrid","Arsenal","Chelsea","Man United","Man City"]

crawler = CrawlerFriend.Crawler(urls, keywords)
crawler.crawl()
crawler.get_result_in_html()

The above code will open the following HTML document in Browser

All Result in Dictionary

result_dict = crawler.get_result()

Changing Default Arguments

CrawlerFriend uses four HTML tags 'title', 'h1', 'h2', 'h3' and max_link_limit = 50 by default for searching. But it can be changed by passing arguments to the constructor:

crawler = CrawlerFriend.Crawler(urls, keywords, max_link_limit=200, tags=['p','h4'])
crawler.crawl()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

CrawlerFriend-1.0.11.tar.gz (4.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

CrawlerFriend-1.0.11-py2-none-any.whl (5.0 kB view details)

Uploaded Python 2

File details

Details for the file CrawlerFriend-1.0.11.tar.gz.

File metadata

  • Download URL: CrawlerFriend-1.0.11.tar.gz
  • Upload date:
  • Size: 4.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/18.1 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/3.4.3

File hashes

Hashes for CrawlerFriend-1.0.11.tar.gz
Algorithm Hash digest
SHA256 23982159c4c6f2e6678d0f519ad2c984968a1d503d20ef5585296b7cd1054e91
MD5 7f8918aab8edc656dafd154be11cedbd
BLAKE2b-256 218cb16ab03a45152625455419e121a3bd052e38aed75de8b6937c0c14aa78d8

See more details on using hashes here.

File details

Details for the file CrawlerFriend-1.0.11-py2-none-any.whl.

File metadata

  • Download URL: CrawlerFriend-1.0.11-py2-none-any.whl
  • Upload date:
  • Size: 5.0 kB
  • Tags: Python 2
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/18.1 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/3.4.3

File hashes

Hashes for CrawlerFriend-1.0.11-py2-none-any.whl
Algorithm Hash digest
SHA256 609aefc0c1004cf31441cd5442a4f3bd3e450821c650fd7affb2179d12f53c19
MD5 0d774bf665e2f509d03077ce9e574435
BLAKE2b-256 bd5ff92072259846abe7c5253f9d1388778fb9aed6a675374925a5c814ba6f0c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page