TODO
Project description
Nichirin: A custom search framework featuring augmented generation with retrieval capabilities.
Overview
Nichirin serves as an advanced layer atop Apache Solr, facilitating seamless data indexing operations.
-
What is Nichirin?
- Nichirin acts as a surface or layer on top of Apache Solr, making data indexing a breeze.
- It abstracts away the complexities of Solr indexing, allowing users to focus on providing their data without worrying about the nitty-gritty details.
-
Key Features:
- Multi-level Crawling: Performs multi-level web crawling utilizing a depth-first search methodology, with text indexing and retrieval facilitated through Apache Solr.
- Efficient Indexing: Integrated Apache Spark for parallel processing of URLs, improving the scalability and efficiency of both web crawling and text indexing.
- Python package: Available as a Python package on PyPI for easy installation and integration
Commands
install-solr
to install solrcreate-core --core <core name>
to create solr core,partition-data --path <path to the dataset>
to partition the datapipeline --path <path to the dataset>
generate embeddings of the partition dataindex-solr --data-path <path to dataset> --core <core to which the data needs to be sent>
index the dataquery-solr --input_sen <input sen> --core_name <core name to query from>
query the data from solrseed-urls --core <core name> --urls <urls separted with commas>
to add the seed urlsstart-crawler
to start the web crawlerstart-serve
to start the web server
Quickstart
- Begin by executing the
install-solr
command to install the Solr application. - Next, create the cores using the
create-core
command. - After setting up Solr and creating the cores, add seed URLs by running the
seed-urls
command. - Once the seed URLs are added, initiate the crawling process with the
start-crawler
command. Be patient, as this step may take some time. - Finally, to view the results, launch the Flask web app using the
start-serve
command.
Contributing and Feedback: We welcome contributions! If you’d like to enhance Nichirin or report issues, feel free to submit a pull request. For feedback or questions, open an issue on our GitHub repository.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
nichirin-0.0.1.tar.gz
(15.5 kB
view hashes)
Built Distribution
nichirin-0.0.1-py3-none-any.whl
(18.0 kB
view hashes)