Skip to main content

Crawler and search tools used by Sirji.

Project description

sirji-tools

sirji-tools is a PyPI package that provides tools for:

  • Crawling (downloading web pages to markdown files)
  • Searching (using the Google Search API for search results)
  • Custom Logging

Installation

pip install sirji-tools

Usage

Crawl URLs

Crawl URLs tool will be used to crawl the web pages and extract the information from the web pages. And store the information for the further processing by researcher.

from sirji_tools import crawl_urls

urls = ['https://www.google.com', 'https://www.yahoo.com']

crawl_urls(urls, 'workspace/researcher')

Search

Search tool will be used to search the information from the web pages based on the search terms provided. It returns the list of URLs related to the search terms.

from sirji_tools import search_for

search_term = 'python programming'

urls = search_for(search_term)

Logger

Logger tool will be used to log the information in the log file. It will be used to log the information to show the progress of the execution.

from sirji_tools.logger import p_logger

p_logger.info("Log line here")

License

Distributed under the MIT License. See LICENSE for more information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sirji-tools-0.0.4.tar.gz (9.3 kB view hashes)

Uploaded Source

Built Distribution

sirji_tools-0.0.4-py3-none-any.whl (9.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page