A library for building Retrieval-Augmented Generation pipelines.

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: Apache Software License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

RAG Toolkit

The RAG Toolkit is a library designed to streamline the creation of Retrieval-Augmented Generation (RAG) pipelines. It provides utilities for document processing, vector-based retrieval, query routing, and integration with large language models (LLMs). This toolkit simplifies the development of RAG-based systems, enabling developers to focus on solving real-world problems.

Features

Document Processing: Load and preprocess documents from PDFs and other sources.
Vector Store Retriever: Create retrievers using embeddings for efficient information retrieval.
Query Routing: Smart routing based on user-defined logic or embeddings.
RAG Pipelines: Easily build and customize RAG pipelines for various use cases.
Customizable Templates: Use or define your own templates for specific tasks or domains.
Integrations: Works with popular LLMs and embedding models.

Installation

To install the RAG Toolkit, clone the repository and install it locally:

# Clone the repository
git clone https://github.com/youssef-yasser-ali/rag-toolkit.git

# Navigate to the directory
cd rag-toolkit

# Install the library
pip install .

Or install directly from PyPI (if published):

pip install rag-toolkit

Quickstart Guide

1. Import the Library

from rag_toolkit.data_loader import load_pdf_pages
from rag_toolkit.vector_store import create_vector_store_retriever
from rag_toolkit.pipeline import RagPipeline
from rag_toolkit.routing import QueryRouter
from rag_toolkit.google_models import initialize_llm

# Optional: Load configurations
from config.config import get_generator_api_key, GENRATIVE_MODEL

2. Load Documents

Use the load_pdf_pages function to load and preprocess documents:

# Load documents from a PDF
pdf_path = "./data/raw/sample.pdf"
docs = load_pdf_pages(pdf_path, start_page=1, end_page=10)

3. Create a Retriever

Generate a vector-based retriever using an embedding model:

from rag_toolkit.google_models import initialize_embedding
from config.config import get_embedding_api_key, EMBEDDING_MODEL

# Initialize embedding model
embedding_model = initialize_embedding(model_name=EMBEDDING_MODEL, api_key=get_embedding_api_key())

# Create retriever
retriever = create_vector_store_retriever(docs, embedding_model)

4. Build a Pipeline

Combine the retriever and generator into a RAG pipeline:

# Initialize LLM
retrieval_llm = initialize_llm(model_name=GENRATIVE_MODEL, api_key=get_generator_api_key())

# Build pipeline
pipeline = RagPipeline(retrieval=retriever, generator=retrieval_llm)

# Query the pipeline
query = "Explain transformers in machine learning."
response = pipeline.process(query)
print(response)

5. Use Query Routing

Route queries to specific data sources or templates:

datasources = ["python_docs", "js_docs", "golang_docs"]
router = QueryRouter(datasources=datasources, model=retrieval_llm, routing_logic="Choose the best match.")

question = "Why doesn't the following JavaScript code work?"
selected_datasource = router.route(question)
print(f"Selected Datasource: {selected_datasource}")

Examples

See the examples/ directory for real-world usage scenarios:

Example 1: Build a RAG pipeline for document QA.
Example 2: Route queries to different datasources.
Example 3: Customize retriever and generator templates.

Run the examples:

python examples/example_pipeline.py

Dependencies

The RAG Toolkit requires the following Python libraries:

openai
numpy
pandas
scikit-learn
PyPDF2
faiss
tqdm

Install dependencies using:

pip install -r requirements.txt

Testing

Run the unit tests to verify the library:

pytest tests/

Contributing

We welcome contributions! If you want to contribute:

Fork the repository.
Create a new branch for your feature.
Commit your changes.
Submit a pull request.

License

This project is licensed under the Apache License 2.0. See the LICENSE file for details.

Contact

For questions or support, please contact:

Email: your.email@example.com
GitHub: YourUsername

Happy Coding!

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: Apache Software License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

0.1.1

Dec 16, 2024

This version

0.1.0

Dec 16, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rag-toolkit-0.1.0.tar.gz (13.9 kB view details)

Uploaded Dec 16, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

rag_toolkit-0.1.0-py3-none-any.whl (14.7 kB view details)

Uploaded Dec 16, 2024 Python 3

File details

Details for the file rag-toolkit-0.1.0.tar.gz.

File metadata

Download URL: rag-toolkit-0.1.0.tar.gz
Upload date: Dec 16, 2024
Size: 13.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.9.16

File hashes

Hashes for rag-toolkit-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`34546f9ca499fbb74d0a9046ee4874c157d220513d03ae2bb64083b83fa55bbd`
MD5	`ae369d5813072f829930e91a505f072f`
BLAKE2b-256	`eaf30be4384f56bc160574fc29fb160efb93e8db83c12239243aa26cde018cc8`

See more details on using hashes here.

File details

Details for the file rag_toolkit-0.1.0-py3-none-any.whl.

File metadata

Download URL: rag_toolkit-0.1.0-py3-none-any.whl
Upload date: Dec 16, 2024
Size: 14.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.9.16

File hashes

Hashes for rag_toolkit-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`005d6f36b140f8e1c444d1b98c784cb4534d26e04484959d725b62724833255c`
MD5	`d27b1a9422c72f363635e0a1532cad92`
BLAKE2b-256	`f6cdb6aa7b226ba94d3c09a1e7b4464553ae9c987b4e3cca61fa54e8660ee759`

See more details on using hashes here.

rag-toolkit 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

RAG Toolkit

Features

Installation

Quickstart Guide

1. Import the Library

2. Load Documents

3. Create a Retriever

4. Build a Pipeline

5. Use Query Routing

Examples

Dependencies

Testing

Contributing

License

Contact

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes