A powerful document indexing and querying tool built on top of LlamaIndex

These details have not been verified by PyPI

Project links

Homepage

Project description

NexuSync

NexuSync is a lightweight and powerful library of Retrieval-Augmented Generation (RAG) systems built on top of LlamaIndex. It provides developers with a simple, user-friendly interface to configure and deploy RAG systems efficiently. You can choose to use local LLM model for off-line running with privacy.

NexuSync Logo

Features

Lightweight Design: NexuSync is built with simplicity in mind, making it easy for developers to integrate and configure RAG systems without unnecessary complexity.
User-Friendly Interface: With intuitive APIs and clear documentation, setting up your RAG system has never been easier.
Flexible Document Indexing: Automatically index documents from specified directories, keeping your knowledge base up-to-date.
Efficient Querying: Use natural language to query your document collection and get relevant answers quickly.
Conversational Interface: Engage in chat-like interactions with your document collection for more intuitive information retrieval.
Customizable Embedding Options: Choose between various embedding models to suit your needs and constraints.
Incremental Updates: Easily update or insert new documents into the index without rebuilding from scratch.
Automatic Deletion Handling: Documents removed from the filesystem are automatically removed from the index.

Installation

To install NexuSync, run the following command:

pip install nexusync

Prerequisites

Python 3.7 or higher
Install Ollama: https://ollama.com/download or OpenAI API

Quick Start

Try yourself:

from nexusync import NexuSync

# Customize your parameters for openai model, create .env in the src folder to include OPENAI_API_KEY = 'sk-xxx'
OPENAI_MODEL_YN = True 
EMBEDDING_MODEL = "text-embedding-3-large" 
LANGUAGE_MODEL = "gpt-4o-mini"
TEMPERATURE = 0.4 # range from 0 to 1, higher means higher creativitiy level
CHROMA_DB_DIR = 'chroma_db' # Your path to the chroma db
INDEX_PERSIST_DIR = 'index_storage' # Your path to the index storage
CHROMA_COLLECTION_NAME = 'my_collection' 
INPUT_DIRS = ["../sample_docs"] # can specify multiple document paths
CHUNK_SIZE = 1024 # Size of text chunks for creating embeddings
CHUNK_OVERLAP = 20 # Overlap between text chunks to maintain context
RECURSIVE = True # Recursive or not under one folder

# Customize your parameters for ollama model
OPENAI_MODEL_YN = False # if False, you will use ollama model
EMBEDDING_MODEL = "BAAI/bge-base-en-v1.5" # suggested embedding model, you can replace with any HuggingFace embedding models
LANGUAGE_MODEL = 'llama3.2' # you need to download ollama model first, please check https://ollama.com/download
TEMPERATURE = 0.4 # range from 0 to 1, higher means higher creativitiy level
CHROMA_DB_DIR = 'chroma_db' # Your path to the chroma db
INDEX_PERSIST_DIR = 'index_storage' # Your path to the index storage
CHROMA_COLLECTION_NAME = 'my_collection' 
INPUT_DIRS = ["../sample_docs"] # can specify multiple document paths
CHUNK_SIZE = 1024 # Size of text chunks for creating embeddings
CHUNK_OVERLAP = 20 # Overlap between text chunks to maintain context
RECURSIVE = True # Recursive or not under one folder


# Initialize vector DB
ns = NexuSync(input_dirs=INPUT_DIRS, 
              openai_model_yn=OPENAI_MODEL_YN, 
              embedding_model=EMBEDDING_MODEL, 
              language_model=LANGUAGE_MODEL, 
              temperature=TEMPERATURE, 
              chroma_db_dir = CHROMA_DB_DIR,
              index_persist_dir = INDEX_PERSIST_DIR,
              chroma_collection_name=CHROMA_COLLECTION_NAME,
              chunk_overlap=CHUNK_OVERLAP,
              chunk_size=CHUNK_SIZE,
              recursive=RECURSIVE
              )

# Prompt Engineering
text_qa_template = (
    "Context information is below.\n"
    "---------------------\n"
    "{context_str}\n"
    "---------------------\n"
    "Given the context information above, I want you to think step by step to answer the query in a crisp manner. "
    "In case you don't know the answer, say 'I don't know!'.\n"
    "Query: {query_str}\n"
    "Answer: "
)

# Initalize the chat engine
ns.initialize_stream_chat(
    text_qa_template=text_qa_template,
    chat_mode="context",
    similarity_top_k=3
)

# Start the stream chat:
query = "how to install NexuSync?"

for item in ns.start_chat_stream(query):
    if isinstance(item, str):
        # This is a token, print or process as needed
        print(item, end='', flush=True)
    else:
        # This is the final response with metadata
        print("\n\nFull response:", item['response'])
        print("Metadata:", item['metadata'])
        break

# Get chat history
chat_history = ns.chat_engine.get_chat_history()
print("Chat History:")
for entry in chat_history:
    print(f"Human: {entry['query']}")
    print(f"AI: {entry['response']}\n")

# If you have files modified, inserted or deleted, you don't need to rebuild all the index
ns.refresh_index()

# Rebuild your index if you changed the embedding/language model
from nexusync import rebuild_index

rebuild_index(input_dirs=INPUT_DIRS, 
              openai_model_yn=OPENAI_MODEL_YN, 
              embedding_model=EMBEDDING_MODEL, 
              language_model=LANGUAGE_MODEL, 
              temperature=TEMPERATURE, 
              chroma_db_dir = CHROMA_DB_DIR,
              index_persist_dir = INDEX_PERSIST_DIR,
              chroma_collection_name=CHROMA_COLLECTION_NAME,
              chunk_overlap=CHUNK_OVERLAP,
              chunk_size=CHUNK_SIZE,
              recursive=RECURSIVE
              )

# Reinitialize after rebuilding
ns = NexuSync(input_dirs=INPUT_DIRS, 
              openai_model_yn=OPENAI_MODEL_YN, 
              embedding_model=EMBEDDING_MODEL, 
              language_model=LANGUAGE_MODEL, 
              temperature=TEMPERATURE, 
              chroma_db_dir = CHROMA_DB_DIR,
              index_persist_dir = INDEX_PERSIST_DIR,
              chroma_collection_name=CHROMA_COLLECTION_NAME,
              chunk_overlap=CHUNK_OVERLAP,
              chunk_size=CHUNK_SIZE,
              recursive=RECURSIVE
              )

Use Interface

git clone or download this project:

git clone https://github.com/Zakk-Yang/nexusync.git

Under the project folder, open the terminal and run

python back_end_api.py

Screen Shot

For more detailed usage examples, check out the demo notebooks.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.3.6

Oct 21, 2024

0.3.5

Oct 21, 2024

0.3.4

Oct 21, 2024

0.3.3

Oct 21, 2024

0.3.2

Oct 20, 2024

0.3.1

Oct 20, 2024

0.3.0

Oct 20, 2024

0.2.7

Oct 13, 2024

0.2.6

Oct 13, 2024

This version

0.2.5

Oct 13, 2024

0.2.4

Oct 12, 2024

0.2.2

Oct 9, 2024

0.2.0

Oct 9, 2024

0.1.24

Oct 7, 2024

0.1.23

Oct 7, 2024

0.1.22

Oct 7, 2024

0.1.21

Oct 7, 2024

0.1.3

Oct 8, 2024

0.1.2

Oct 7, 2024

0.1.1

Oct 7, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nexusync-0.2.5.tar.gz (16.1 kB view hashes)

Uploaded Oct 13, 2024 Source

Built Distribution

nexusync-0.2.5-py3-none-any.whl (18.4 kB view hashes)

Uploaded Oct 13, 2024 Python 3

Hashes for nexusync-0.2.5.tar.gz

Hashes for nexusync-0.2.5.tar.gz
Algorithm	Hash digest
SHA256	`4b7cbdd3ab10b65aa6e05588be478cc82f4b8181541cbec0516ded9c04333674`
MD5	`67aa031122508ca7d9048eba208c3463`
BLAKE2b-256	`b0d492c0df99e1a38fc502953fd96db7a8f3a12d5eb2ffeca0a0dc404ac76edc`

Hashes for nexusync-0.2.5-py3-none-any.whl

Hashes for nexusync-0.2.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`138173e9ef361e56c800d321e319b6326f851267dcc6cbe98823b6ab9854b7a3`
MD5	`a06db631e1fc933874bdfb8cfca44f2c`
BLAKE2b-256	`3b1b3cb533ddde703b37f5b2f85eedf514aa37bf6f265444525d363e824e77d3`