Skip to main content

TODO

Project description

Nichirin: A custom search framework featuring augmented generation with retrieval capabilities.

Overview

Nichirin serves as an advanced layer atop Apache Solr, facilitating seamless data indexing operations.

  1. What is Nichirin?

    • Nichirin acts as a surface or layer on top of Apache Solr, making data indexing a breeze.
    • It abstracts away the complexities of Solr indexing, allowing users to focus on providing their data without worrying about the nitty-gritty details.
  2. Key Features:

    • Multi-level Crawling: Performs multi-level web crawling utilizing a depth-first search methodology, with text indexing and retrieval facilitated through Apache Solr.
    • Efficient Indexing: Integrated Apache Spark for parallel processing of URLs, improving the scalability and efficiency of both web crawling and text indexing.
    • Python package: Available as a Python package on PyPI for easy installation and integration

Commands

  • install-solr to install solr
  • create-core --core <core name> to create solr core,
  • partition-data --path <path to the dataset> to partition the data
  • pipeline --path <path to the dataset> generate embeddings of the partition data
  • index-solr --data-path <path to dataset> --core <core to which the data needs to be sent> index the data
  • query-solr --input_sen <input sen> --core_name <core name to query from> query the data from solr
  • seed-urls --core <core name> --urls <urls separted with commas> to add the seed urls
  • start-crawler to start the web crawler
  • start-serve to start the web server

Quickstart

  1. Begin by executing the install-solr command to install the Solr application.
  2. Next, create the cores using the create-core command.
  3. After setting up Solr and creating the cores, add seed URLs by running the seed-urls command.
  4. Once the seed URLs are added, initiate the crawling process with the start-crawler command. Be patient, as this step may take some time.
  5. Finally, to view the results, launch the Flask web app using the start-serve command.

Contributing and Feedback: We welcome contributions! If you’d like to enhance Nichirin or report issues, feel free to submit a pull request. For feedback or questions, open an issue on our GitHub repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nichirin-0.0.1.tar.gz (15.5 kB view hashes)

Uploaded Source

Built Distribution

nichirin-0.0.1-py3-none-any.whl (18.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page