Skip to main content

A RAG pipeline using ColBERT via RAGatouille

Project description

ColRAG

ColRAG is a RAG (Retrieval-Augmented Generation) pipeline using ColBERT via RAGatouille. It provides easy-to-use functions for indexing documents from various file formats and retrieving relevant information using the ColBERT model.

Installation

To install ColRAG, you'll need to use Poetry. If you don't have Poetry installed, you can install it by following the instructions on the official Poetry website.

Once you have Poetry installed, follow these steps:

  1. Clone the ColRAG repository:

    git clone https://github.com/your-username/colrag.git
    cd colrag
    
  2. Install the dependencies using Poetry:

    poetry install
    

This will install all the necessary dependencies, including the latest version of RAGatouille from GitHub.

Usage

Indexing Documents

from colrag import index_documents

input_directory = "path/to/your/documents"
index_name = "my_colrag_index"
index_path = index_documents(input_directory, index_name)
print(f"Index created at: {index_path}")

Retrieving Documents

from colrag import load_model_from_index, retrieve_documents, retrieve_multiple_documents

# Load the model from the index
index_path = "path/to/your/index"
model = load_model_from_index(index_path)

# Single query
query = "What is the main topic of this document?"
results = retrieve_documents(model, query)
for result in results:
    print(f"Rank: {result['rank']}, Score: {result['score']}, Content: {result['content'][:100]}...")

# Multiple queries
queries = [
    "What is the main topic of this document?",
    "Who are the key people mentioned?",
    "What are the main conclusions?"
]
multi_results = retrieve_multiple_documents(model, queries)
for i, query_results in enumerate(multi_results):
    print(f"Query {i + 1}:")
    for result in query_results[:3]:  # Print top 3 results for each query
        print(f"Rank: {result['rank']}, Score: {result['score']}, Content: {result['content'][:100]}...")

License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

colrag-0.1.0.tar.gz (7.0 kB view details)

Uploaded Source

Built Distribution

colrag-0.1.0-py3-none-any.whl (8.7 kB view details)

Uploaded Python 3

File details

Details for the file colrag-0.1.0.tar.gz.

File metadata

  • Download URL: colrag-0.1.0.tar.gz
  • Upload date:
  • Size: 7.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.11.7 Darwin/23.5.0

File hashes

Hashes for colrag-0.1.0.tar.gz
Algorithm Hash digest
SHA256 08069077169ce30c7a3ef26f4b8b099a0ffdf9d7273cbeabcbdbc49abc9dc716
MD5 81a18f7cce867b2e6bc8a4d25d7ed329
BLAKE2b-256 501bfcb2f8a125008f7a367a4dbc28adbb1a16ae774886dd5b0e27d76183d7a8

See more details on using hashes here.

File details

Details for the file colrag-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: colrag-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 8.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.11.7 Darwin/23.5.0

File hashes

Hashes for colrag-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 edceb304343cb96378ae32d31f303d3e40c5493c7781709d8cd6cae9b311a465
MD5 fd3789ad333d110e960e580454f6b4f1
BLAKE2b-256 6e26c82a07605cdc374f980aa5a7e7f643c82d4a7c948ec0f84c4325e755aa3c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page