A RAG (Retrieval-Augmented Generation) system using Llama Index and ChromaDB

These details have not been verified by PyPI

Project links

Homepage

Project description

Llama Index Query Engine + Ollama Model to Create Your Own Knowledge Pool

This project is a robust and modular application that builds an efficient query engine using LlamaIndex, ChromaDB, and custom embeddings. It allows you to index documents from multiple directories and query them using natural language.

Input Question Example: where can i find the adress of Jason Black? Output example: The address is 'xxx, xxx, xxx'

Features
Project Structure
Prerequisites
Installation
Configuration
Usage
Contributing
License

Features

Modular Design: The project is organized into separate modules for easy maintenance and scalability.
Efficient Indexing: Uses ChromaDB to store embeddings, allowing efficient indexing and querying.
Incremental Updates: Only new or updated documents are indexed, improving performance.
Multiple Directories Support: Indexes documents from multiple directories across different locations.
Custom Embeddings: Utilizes custom embedding models for better performance.
Error Handling: Gracefully handles missing directories or files and recreates the index as needed.
Logging: Provides detailed logs for monitoring and debugging.
Advanced Text-Based File Support: Supports a variety of text-based file formats, including:
- Text Files: Plain text (.txt), Markdown (.md), HTML (.html, .htm), XML (.xml), CSV (.csv).
- Document Files: PDF (.pdf), Microsoft Word (.doc, .docx), Rich Text Format (.rtf).
- Jupyter Notebooks: Jupyter Notebook (.ipynb).

Project Structure

my_llama_project/
├── main.py
├── models.py
├── data_loader.py
├── indexer.py
├── query_engine.py
├── prompts.py
├── configs.py            # Updated config file with multiple INPUT_DIRS
├── document_tracker.py   # New module for tracking indexed files
├── requirements.txt
├── storage/              # Directory for persisted index data (created automatically if missing)
├── chroma_db/            # Directory for ChromaDB data (created automatically if missing)
├── indexed_files.json    # Indexed files metadata (auto-created if missing)
└── documents/            # Directory containing the documents to be indexed

Prerequisites

Python 3.7 or higher: Ensure you have Python installed.
Git: For cloning the repository.
Pip: Python package installer.

Installation

Clone the Repository

git clone https://github.com/Zakk-Yang/ollama-rag.git
cd my_llama_project

Create a Virtual Environment (Recommended)

conda create -n env python=3.10
conda activate env

Install Dependencies
```
pip install -r requirements.txt
```
Install Ollama model Please visit https://ollama.com/download for more details. Install your selected model by the following example:

ollama pull llama3.2

Configuration

Configure Input Directories Open configs.py and update the INPUT_DIRS list with the paths to your document directories.

INPUT_DIRS = [
    '/your/path/to/your/document1',
    '/your/path/to/your/document2',
    # Add more directories as needed
]

Model Name and Other Path Locations

Usage

Run the Application

python main.py

2.Running with a Custom Query

python main.py --query "Your custom query here"

Contributing

Contributions are welcome! Please follow these steps:

Fork the Repository
Create a Branch

git checkout -b feature/your-feature-name

Commit Your Changes

git commit -am 'Add new feature'

4.Push to the Branch

git push origin feature/your-feature-name

License

The source code for the site is licensed under the MIT license, which you can find in the MIT-LICENSE.txt file.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.4.1

Oct 2, 2024

0.4.0

Oct 2, 2024

0.3.11

Sep 30, 2024

0.3.1

Sep 30, 2024

0.3.0

Sep 29, 2024

0.2.9

Sep 29, 2024

0.2.4

Sep 29, 2024

0.2.3

Sep 29, 2024

0.2.1

Sep 29, 2024

This version

0.2.0

Sep 29, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ollama_rag-0.2.0.tar.gz (15.9 kB view hashes)

Uploaded Sep 29, 2024 Source

Built Distribution

ollama_rag-0.2.0-py3-none-any.whl (11.9 kB view hashes)

Uploaded Sep 29, 2024 Python 3

Hashes for ollama_rag-0.2.0.tar.gz

Hashes for ollama_rag-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`ec083b0091fff123d4e1846ddaacedc05efeb1ccea296698e49e07ca4e6502cc`
MD5	`8402ccdaad4f683047468dfa80160800`
BLAKE2b-256	`cb1fa0766028630ff8bd16a07fddf7ec5fed28e63b13e62228f7d770313b81fa`

Hashes for ollama_rag-0.2.0-py3-none-any.whl

Hashes for ollama_rag-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5fb8e44394e774f2e9d8bafadb87941397caad23315b9efbf20657c071f22734`
MD5	`19fea00d22bcf326f5aeb2101e2519ae`
BLAKE2b-256	`3b3d5e5509f102f87fcadf885f0a86c34a574fd3374a96255a44c673525f50b8`