A RAG pipeline using ColBERT via RAGatouille
Project description
ColRAG
ColRAG is a RAG (Retrieval-Augmented Generation) pipeline using ColBERT via RAGatouille. It provides easy-to-use functions for indexing documents from various file formats and retrieving relevant information using the ColBERT model.
Installation
To install ColRAG, you'll need to use Poetry. If you don't have Poetry installed, you can install it by following the instructions on the official Poetry website.
Once you have Poetry installed, follow these steps:
-
Clone the ColRAG repository:
git clone https://github.com/your-username/colrag.git cd colrag
-
Install the dependencies using Poetry:
poetry install
This will install all the necessary dependencies, including the latest version of RAGatouille from GitHub.
Usage
Indexing Documents
from colrag import index_documents
input_directory = "path/to/your/documents"
index_name = "my_colrag_index"
index_path = index_documents(input_directory, index_name)
print(f"Index created at: {index_path}")
Retrieving Documents
from colrag import load_model_from_index, retrieve_documents, retrieve_multiple_documents
# Load the model from the index
index_path = "path/to/your/index"
model = load_model_from_index(index_path)
# Single query
query = "What is the main topic of this document?"
results = retrieve_documents(model, query)
for result in results:
print(f"Rank: {result['rank']}, Score: {result['score']}, Content: {result['content'][:100]}...")
# Multiple queries
queries = [
"What is the main topic of this document?",
"Who are the key people mentioned?",
"What are the main conclusions?"
]
multi_results = retrieve_multiple_documents(model, queries)
for i, query_results in enumerate(multi_results):
print(f"Query {i + 1}:")
for result in query_results[:3]: # Print top 3 results for each query
print(f"Rank: {result['rank']}, Score: {result['score']}, Content: {result['content'][:100]}...")
License
This project is licensed under the MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file colrag-0.1.0.tar.gz
.
File metadata
- Download URL: colrag-0.1.0.tar.gz
- Upload date:
- Size: 7.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.2 CPython/3.11.7 Darwin/23.5.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 08069077169ce30c7a3ef26f4b8b099a0ffdf9d7273cbeabcbdbc49abc9dc716 |
|
MD5 | 81a18f7cce867b2e6bc8a4d25d7ed329 |
|
BLAKE2b-256 | 501bfcb2f8a125008f7a367a4dbc28adbb1a16ae774886dd5b0e27d76183d7a8 |
File details
Details for the file colrag-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: colrag-0.1.0-py3-none-any.whl
- Upload date:
- Size: 8.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.2 CPython/3.11.7 Darwin/23.5.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | edceb304343cb96378ae32d31f303d3e40c5493c7781709d8cd6cae9b311a465 |
|
MD5 | fd3789ad333d110e960e580454f6b4f1 |
|
BLAKE2b-256 | 6e26c82a07605cdc374f980aa5a7e7f643c82d4a7c948ec0f84c4325e755aa3c |