RAG using Langchain and RAGatouille
Project description
ColBERT RAG
Index GitHub repositories to ColBERT models and serve them with GRPC or FastAPI.
Features
- ColBERT-based retrieval for contextually influenced token-level embeddings
- Support for indexing Git repositories with language specific chunking
- GRPC and FastAPI server implementations
- Flexible document processing and indexing
License
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
The Apache License 2.0 is a permissive license that also provides an express grant of patent rights from contributors to users. Key points:
- It allows you to freely use, modify, distribute, and sell your software.
- If you modify the code, you may distribute your modified version, but you must include a notice stating that you changed the files.
- Any modifications or larger works may be distributed under different terms and without source code, but they must include a copy of the Apache 2.0 license.
- You must retain all copyright, patent, trademark, and attribution notices from the original code.
Setup
- Install dependencies using Poetry:
poetry install
- Generate protobuf files:
poetry run generate-protos
Usage
Using Scripts
The project includes several utility scripts in the /scripts
directory:
-
Generate Protobuf Files:
poetry run generate-protos
This script generates the necessary Python files from the protobuf definition.
-
Create Index:
poetry run create-index --name <index_name> --repo_name username/repo-name --chunk_size 512
This script creates an index from a specified Git repository.
-
Run Server:
poetry run server --type grpc|fastapi --index <index_name>
-
Run Type Checking:
poetry run type-check
This script runs MyPy for type checking across the project.
Indexing a Git Repository
from colbert_rag.indexer.git_repo import index_git_repo
index_path = index_git_repo(
model_name="colbert-ir/colbertv2.0",
index_name="my-repo-index",
repo_name="username/repo-name",
max_document_length=512)
Running the Server
GRPC Server
from ragatouille import RAGPretrainedModel
from colbert_rag import GRPCServer
from colbert_rag.config import COLBERTRAG_GRPC_PORT, COLBERTRAG_HOST
model = RAGPretrainedModel.from_index("path/to/your/index")
server = GRPCServer(model)
server.serve(COLBERTRAG_HOST, COLBERTRAG_GRPC_PORT)
FastAPI Server
from ragatouille import RAGPretrainedModel
from colbert_rag import FastAPIServer
from colbert_rag.config import COLBERTRAG_FASTAPI_PORT, COLBERTRAG_HOST
model = RAGPretrainedModel.from_index("path/to/your/index")
server = FastAPIServer(model)
server.serve(COLBERTRAG_HOST, COLBERTRAG_FASTAPI_PORT)
Client Examples
GRPC Client
import grpc
from colbert_rag.proto import colbertrag_pb2, colbertrag_pb2_grpc
def run():
with grpc.insecure_channel('localhost:50051') as channel:
stub = colbertrag_pb2_grpc.ColbertRAGStub(channel)
response = stub.Retrieve(colbertrag_pb2.Request(query="Your query here", k=2))
print("ColbertRAG client received:")
for doc in response.documents:
print(f"Page content: {doc.page_content}")
print(f"Metadata: {doc.metadata}")
print("---")
if __name__ == '__main__':
run()
FastAPI Client
Using Python with the requests
library:
import requests
import json
url = "http://localhost:8000/retrieve"
payload = {
"query": "Your query here",
"k": 2
}
headers = {
"Content-Type": "application/json"
}
response = requests.post(url, data=json.dumps(payload), headers=headers)
if response.status_code == 200:
data = response.json()
print("ColbertRAG client received:")
for doc in data["documents"]:
print(f"Page content: {doc['page_content']}")
print(f"Metadata: {doc['metadata']}")
print("---")
else:
print(f"Error: {response.status_code}")
print(response.text)
Using curl:
curl -X POST "http://localhost:8000/retrieve" \
-H "Content-Type: application/json" \
-d '{"query": "Your query here", "k": 2}'
Development
Type Checking
Run MyPy for type checking:
poetry run type-check
Running Tests
poetry run pytest
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file colbert_rag-0.1.2.tar.gz
.
File metadata
- Download URL: colbert_rag-0.1.2.tar.gz
- Upload date:
- Size: 13.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.10.14 Linux/6.5.0-1025-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | cf2c334371e5bbff182312012823903f1b8df6b2feb8cbc3b1875ed860e0171e |
|
MD5 | 049b36ae05d5df3d59bfb7f582430bcd |
|
BLAKE2b-256 | 0ed22573cf9ae5b4315eec487e3066721bee5c20b7a4cc6ccd07c4ae66d4c04f |
File details
Details for the file colbert_rag-0.1.2-py3-none-any.whl
.
File metadata
- Download URL: colbert_rag-0.1.2-py3-none-any.whl
- Upload date:
- Size: 17.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.10.14 Linux/6.5.0-1025-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 67059ab4ab1d6fd82fab5b415e18d75fc11738d3dc0397531bddf3f8a5e2ca6c |
|
MD5 | 47b971caec524af6f6a09d6a8f82662f |
|
BLAKE2b-256 | 09afb9dafe9a2fa02f99d8e4235446a10115d1d04f708b8cd9b407289ff704bc |