TODO
Project description
Nichirin: A custom search framework featuring augmented generation with retrieval capabilities.
Overview
Nichirin serves as an advanced layer atop Apache Solr, facilitating seamless data indexing operations.
-
What is Nichirin?
- Nichirin acts as a surface or layer on top of Apache Solr, making data indexing a breeze.
- It abstracts away the complexities of Solr indexing, allowing users to focus on providing their data without worrying about the nitty-gritty details.
-
Key Features:
- Multi-level Crawling: Performs multi-level web crawling utilizing a depth-first search methodology, with text indexing and retrieval facilitated through Apache Solr.
- Efficient Indexing: Integrated Apache Spark for parallel processing of URLs, improving the scalability and efficiency of both web crawling and text indexing.
- Python package: Available as a Python package on PyPI for easy installation and integration
Commands
install-solr
to install solrcreate-core --core <core name>
to create solr core,partition-data --path <path to the dataset>
to partition the datapipeline --path <path to the dataset>
generate embeddings of the partition dataindex-solr --data-path <path to dataset> --core <core to which the data needs to be sent>
index the dataquery-solr --input_sen <input sen> --core_name <core name to query from>
query the data from solrseed-urls --core <core name> --urls <urls separted with commas>
to add the seed urlsstart-crawler
to start the web crawlerstart-serve
to start the web server
Quickstart
- Begin by executing the
install-solr
command to install the Solr application. - Next, create the cores using the
create-core
command. - After setting up Solr and creating the cores, add seed URLs by running the
seed-urls
command. - Once the seed URLs are added, initiate the crawling process with the
start-crawler
command. Be patient, as this step may take some time. - Finally, to view the results, launch the Flask web app using the
start-serve
command.
Contributing and Feedback: We welcome contributions! If you’d like to enhance Nichirin or report issues, feel free to submit a pull request. For feedback or questions, open an issue on our GitHub repository.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
nichirin-0.0.1.tar.gz
(15.5 kB
view details)
Built Distribution
nichirin-0.0.1-py3-none-any.whl
(18.0 kB
view details)
File details
Details for the file nichirin-0.0.1.tar.gz
.
File metadata
- Download URL: nichirin-0.0.1.tar.gz
- Upload date:
- Size: 15.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | dad328896bb7ed37bf076c90e2485defe615d522d817032d0ac134a4bd867923 |
|
MD5 | 416f9e4dd529709070e16705faca0ae2 |
|
BLAKE2b-256 | 7bba9e0884ed1446ce8762863ca18c9ed4f4055bfa404162b58d45054a0264bd |
File details
Details for the file nichirin-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: nichirin-0.0.1-py3-none-any.whl
- Upload date:
- Size: 18.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 608e901ec1dc90237ea616f513c1c7f6d077beacd275a5727aa641fde1f07634 |
|
MD5 | f5d694f996c57aaaf174d7f200221dc4 |
|
BLAKE2b-256 | 3e3f3caebdcb48b57b3f293b0b23c31564011e1c2fdbf1abc18105f713b69a8e |