TODO
Project description
Nichirin: A custom search framework featuring augmented generation with retrieval capabilities.
Overview
Nichirin serves as an advanced layer atop Apache Solr, facilitating seamless data indexing operations.
-
What is Nichirin?
- Nichirin acts as a surface or layer on top of Apache Solr, making data indexing a breeze.
- It abstracts away the complexities of Solr indexing, allowing users to focus on providing their data without worrying about the nitty-gritty details.
-
Key Features:
- Multi-level Crawling: Performs multi-level web crawling utilizing a depth-first search methodology, with text indexing and retrieval facilitated through Apache Solr.
- Efficient Indexing: Integrated Apache Spark for parallel processing of URLs, improving the scalability and efficiency of both web crawling and text indexing.
- Python package: Available as a Python package on PyPI for easy installation and integration
Commands
install-solrto install solrcreate-core --core <core name>to create solr core,partition-data --path <path to the dataset>to partition the datapipeline --path <path to the dataset>generate embeddings of the partition dataindex-solr --data-path <path to dataset> --core <core to which the data needs to be sent>index the dataquery-solr --input_sen <input sen> --core_name <core name to query from>query the data from solrseed-urls --core <core name> --urls <urls separted with commas>to add the seed urlsstart-crawlerto start the web crawlerstart-serveto start the web server
Quickstart
- Begin by executing the
install-solrcommand to install the Solr application. - Next, create the cores using the
create-corecommand. - After setting up Solr and creating the cores, add seed URLs by running the
seed-urlscommand. - Once the seed URLs are added, initiate the crawling process with the
start-crawlercommand. Be patient, as this step may take some time. - Finally, to view the results, launch the Flask web app using the
start-servecommand.
Contributing and Feedback: We welcome contributions! If you’d like to enhance Nichirin or report issues, feel free to submit a pull request. For feedback or questions, open an issue on our GitHub repository.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nichirin-0.0.1.tar.gz.
File metadata
- Download URL: nichirin-0.0.1.tar.gz
- Upload date:
- Size: 15.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dad328896bb7ed37bf076c90e2485defe615d522d817032d0ac134a4bd867923
|
|
| MD5 |
416f9e4dd529709070e16705faca0ae2
|
|
| BLAKE2b-256 |
7bba9e0884ed1446ce8762863ca18c9ed4f4055bfa404162b58d45054a0264bd
|
File details
Details for the file nichirin-0.0.1-py3-none-any.whl.
File metadata
- Download URL: nichirin-0.0.1-py3-none-any.whl
- Upload date:
- Size: 18.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
608e901ec1dc90237ea616f513c1c7f6d077beacd275a5727aa641fde1f07634
|
|
| MD5 |
f5d694f996c57aaaf174d7f200221dc4
|
|
| BLAKE2b-256 |
3e3f3caebdcb48b57b3f293b0b23c31564011e1c2fdbf1abc18105f713b69a8e
|