Skip to main content

TODO

Project description

Nichirin: A custom search framework featuring augmented generation with retrieval capabilities.

Overview

Nichirin serves as an advanced layer atop Apache Solr, facilitating seamless data indexing operations.

  1. What is Nichirin?

    • Nichirin acts as a surface or layer on top of Apache Solr, making data indexing a breeze.
    • It abstracts away the complexities of Solr indexing, allowing users to focus on providing their data without worrying about the nitty-gritty details.
  2. Key Features:

    • Multi-level Crawling: Performs multi-level web crawling utilizing a depth-first search methodology, with text indexing and retrieval facilitated through Apache Solr.
    • Efficient Indexing: Integrated Apache Spark for parallel processing of URLs, improving the scalability and efficiency of both web crawling and text indexing.
    • Python package: Available as a Python package on PyPI for easy installation and integration

Commands

  • install-solr to install solr
  • create-core --core <core name> to create solr core,
  • partition-data --path <path to the dataset> to partition the data
  • pipeline --path <path to the dataset> generate embeddings of the partition data
  • index-solr --data-path <path to dataset> --core <core to which the data needs to be sent> index the data
  • query-solr --input_sen <input sen> --core_name <core name to query from> query the data from solr
  • seed-urls --core <core name> --urls <urls separted with commas> to add the seed urls
  • start-crawler to start the web crawler
  • start-serve to start the web server

Quickstart

  1. Begin by executing the install-solr command to install the Solr application.
  2. Next, create the cores using the create-core command.
  3. After setting up Solr and creating the cores, add seed URLs by running the seed-urls command.
  4. Once the seed URLs are added, initiate the crawling process with the start-crawler command. Be patient, as this step may take some time.
  5. Finally, to view the results, launch the Flask web app using the start-serve command.

Contributing and Feedback: We welcome contributions! If you’d like to enhance Nichirin or report issues, feel free to submit a pull request. For feedback or questions, open an issue on our GitHub repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nichirin-0.0.1.tar.gz (15.5 kB view details)

Uploaded Source

Built Distribution

nichirin-0.0.1-py3-none-any.whl (18.0 kB view details)

Uploaded Python 3

File details

Details for the file nichirin-0.0.1.tar.gz.

File metadata

  • Download URL: nichirin-0.0.1.tar.gz
  • Upload date:
  • Size: 15.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for nichirin-0.0.1.tar.gz
Algorithm Hash digest
SHA256 dad328896bb7ed37bf076c90e2485defe615d522d817032d0ac134a4bd867923
MD5 416f9e4dd529709070e16705faca0ae2
BLAKE2b-256 7bba9e0884ed1446ce8762863ca18c9ed4f4055bfa404162b58d45054a0264bd

See more details on using hashes here.

File details

Details for the file nichirin-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: nichirin-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 18.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for nichirin-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 608e901ec1dc90237ea616f513c1c7f6d077beacd275a5727aa641fde1f07634
MD5 f5d694f996c57aaaf174d7f200221dc4
BLAKE2b-256 3e3f3caebdcb48b57b3f293b0b23c31564011e1c2fdbf1abc18105f713b69a8e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page