Skip to main content

Crawler and search tools used by Sirji.

Project description

Sirji Logo

Sirji is an agentic AI framework for software development.

Built with ❤️ by True Sparrow

GitHub License GitHub commit activity GitHub Issues or Pull Requests PyPI sirji-tools

GitHub Repo stars GitHub forks GitHub watchers

Sirji Tools

sirji-tools is a PyPI package that provides tools for:

  • Crawling (downloading web pages to markdown files)
  • Searching on Google
  • Custom Logging

Installation

Setup Virtual Environment

We recommend setting up a virtual environment to isolate Python dependencies, ensuring project-specific packages without conflicting with system-wide installations.

python3 -m venv venv
source venv/bin/activate

Install Package

Install the package from PyPi:

pip install sirji-tools

Run the following command to install playwright:

playwright install

Usage

Environment Variables

Ensure that the following environment variables are set:

export SIRJI_PROJECT="Absolute folder path for Sirji to use as its project folder."
export SIRJI_RUN_PATH='Folder having run specific logs, etc.'

Crawl URLs

Crawl URLs tool will be used to crawl the web pages and extract the information from the web pages. And store the information for further processing by the researcher.

from sirji_tools import crawl_urls

urls = ['https://www.google.com', 'https://www.yahoo.com']

crawl_urls(urls, 'project/researcher')

Search

Search tool will be used to search the information from the web pages based on the search terms provided. It returns the list of URLs related to the search terms.

from sirji_tools import search_for

search_term = 'python programming'

urls = search_for(search_term)

Logger

Logger tool will be used to log the information in the log file. It will be used to log the information to show the progress of the execution.

from sirji_tools.logger import p_logger

p_logger.info("Log line here")

For Contributors

  1. Fork and clone the repository.
  2. Create and activate the virtual environment as described above.
  3. Set the environment variables as described above.
  4. Install the package in editable mode by running the following command from the repository root:
pip install -e .
  1. Run the following command to install playwright:
playwright install

Running Tests and Coverage Analysis

Follow the above-mentioned steps for "contributors", before running the test cases.

# Install testing dependencies
pip install pytest coverage

# Execute tests
pytest

# Measure coverage, excluding test files
coverage run --omit="tests/*" -m pytest
coverage report

License

Distributed under the MIT License. See LICENSE for more information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sirji_tools-0.0.16.tar.gz (10.6 kB view details)

Uploaded Source

Built Distribution

sirji_tools-0.0.16-py3-none-any.whl (10.2 kB view details)

Uploaded Python 3

File details

Details for the file sirji_tools-0.0.16.tar.gz.

File metadata

  • Download URL: sirji_tools-0.0.16.tar.gz
  • Upload date:
  • Size: 10.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for sirji_tools-0.0.16.tar.gz
Algorithm Hash digest
SHA256 e506351f0d96a8f65c123d2043aac5e583e3df86e71582941c92b11a43f13297
MD5 fa073688f033ff394ac7af6545b8df06
BLAKE2b-256 e15a19b424fff0fbbcb850907cff4dc9104413026e627beb8109b3d6e1ece2f0

See more details on using hashes here.

File details

Details for the file sirji_tools-0.0.16-py3-none-any.whl.

File metadata

  • Download URL: sirji_tools-0.0.16-py3-none-any.whl
  • Upload date:
  • Size: 10.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for sirji_tools-0.0.16-py3-none-any.whl
Algorithm Hash digest
SHA256 c4f85cd7dc502456e8eaaec09eee99159aa32f889205aa02c81609b4580de45a
MD5 d88eb7dea588b7dcbe8ccb62c57ae5f1
BLAKE2b-256 3f9ce040a01dd18f0eea952d0f7bd1d67537754035ffb05fcc19a8a3fa91d475

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page