Skip to main content

Automatic and optimized RAG Pattern generator

Project description

ai4rag icon

ai4rag

RAG Templates Optimization Engine

AI4RAG Python Python

RAG Builder HPO AutoML

Initializes RAG Templates with optimal parameters

Getting StartedUser GuideAPI ReferenceDevelopment


🎯 What is ai4RAG?

ai4RAG is an optimization engine for RAG Templates that is LLM and vector database provider-agnostic. It accepts a variety of RAG Templates and a search space definition, then returns an initialized RAG Template with optimal parameter values (called a RAG Pattern).

[!IMPORTANT] ai4rag is designed to be provider-agnostic: user may provide his own implementation for foundation model, embedding model or vector store and use them for the experiment. Out of the box ai4rag is designed to work with Llama Stack. To use the full capabilities of ai4rag, you'll need access to a Llama Stack server configured with at least one foundation model, one embedding model, and a vector database.

Llama Stack

ai4RAG can run experiments using a Llama Stack server for embeddings, vector storage, and text generation. Use the official client and API docs to connect and extend:

Features used by ai4rag

When using the Llama Stack backend, ai4rag relies on:

  • Embeddings — Text embeddings via the client (e.g. for indexing and query encoding). See Embeddings API in the docs.
  • Vector stores — Create, retrieve, and delete vector store instances (e.g. Milvus) with a chosen embedding model and dimension. See Vector stores in the API docs.
  • Vector IO — Insert document chunks (with embeddings) into a store and run similarity search (query) for retrieval. See Vector IO and insert/query endpoints.
  • Chat / responses — Foundation model integration for answer generation (e.g. chat completions or responses API) when evaluating RAG patterns.

Quick start

  1. Provide an instance of llama-stack-client to integrate with Llama Stack.
  2. Prepare your knowledge base documents for the experiment.
  3. Prepare benchmark_data.json with evaluation questions and answers.
  4. Define and constrain your search space.
  5. Configure the optimizer.
  6. Create and run the experiment.

Prepare llama-stack-client

To enable full integration with Llama Stack, instantiate a LlamaStackClient. This allows ai4rag to use the models and vector stores available on your Llama Stack server.

[!tip] Store your credentials securely in a .env file.

import os
from dotenv import load_dotenv, find_dotenv
from llama_stack_client import LlamaStackClient

client = LlamaStackClient(base_url=os.getenv("BASE_URL"), api_key=os.getenv("API_KEY"))

Prepare knowledge base documents

Prepare a set of documents to serve as the knowledge base for retrieval. These documents will be used to ground the LLM's responses and should be stored in a local directory.

[!note] If you are using the project locally, you can load documents using the FileStore class from the dev_utils module. Supported document formats can be found in the FileStore implementation.

from pathlib import Path
from dev_utils.file_store import FileStore

documents_path = Path("<path to the documents folder>")
documents = FileStore(documents_path).load_as_documents()

Prepare benchmark_data.json

Create a benchmark_data.json file following this schema:

[
	{
		"question": "<question_1>",
		"correct_answers": [
			"<answer 1 for question 1>",
			"<answer 2 for question 1>"
		],
		"correct_answer_document_ids": ["<list of documents ids based on which correct answers were generated>"]
	},
	{
		"question": "<question_2>",
		"correct_answers": [
			"<answer 1 for question 2>",
			"<answer 2 for question 2>"
		],
		"correct_answer_document_ids": ["<list of documents ids based on which correct answers were generated>"]
	}
]

All benchmark questions and answers must be derived from your knowledge base documents.

from dev_utils.utils import read_benchmark_from_json

benchmark_data_path = Path("<path to benchmark_data.json>")
benchmark_data = read_benchmark_from_json(benchmark_data_path)

Define and constrain search space

The search space defines all possible parameter combinations, where each combination creates a unique RAG Pattern. During the experiment, the engine will optimize the RAG Pattern for the selected metric over the given search space, using an objective function to evaluate each configuration.

from ai4rag.search_space.src.parameter import Parameter
from ai4rag.search_space.src.search_space import AI4RAGSearchSpace
from ai4rag.rag.foundation_models.llama_stack import LSFoundationModel
from ai4rag.rag.embedding.llama_stack import LSEmbeddingModel


search_space = AI4RAGSearchSpace(
    params=[
        Parameter(
            name="foundation_model",
            param_type="C",
            values=[LSFoundationModel(model_id="ollama/llama3.2:3b", client=client)],
        ),
        Parameter(
            name="embedding_model",
            param_type="C",
            values=[
                LSEmbeddingModel(
                    model_id="ollama/nomic-embed-text:latest",
                    client=client,
                    params={"embedding_dimension": 768, "context_length": 8192},
                )
            ]
        )
    ]
)

[!tip] To run automatic models discovery with Llama Stack you may use prepare_search_space_with_llama_stack() from ai4rag.search_space.prepare_search_space.

Configure optimizer

You have full control over the optimization algorithm. Configure the GAMOptimizer by adjusting GAMOptSettings.

from ai4rag.core.hpo.gam_opt import GAMOptSettings

optimizer_settings = GAMOptSettings(
    max_evals=10, n_random_nodes=4
)

Run the experiment

Using the information from the previous steps, create an experiment and run the ai4rag optimization engine.

[!note] For Llama Stack vector stores, use the "ls_<provider_id>" format where <provider_id> matches your Llama Stack provider configuration (e.g., "ls_milvus", "ls_qdrant"). To use ChromaDB in-memory, specify "chroma".

from ai4rag.core.experiment.experiment import AI4RAGExperiment
from ai4rag.utils.event_handler import LocalEventHandler

experiment = AI4RAGExperiment(
    client=client,
    documents=documents,
    benchmark_data=benchmark_data,
    search_space=search_space,
    vector_store_type="ls_milvus",
    optimizer_settings=optimizer_settings,
    event_handler=LocalEventHandler(output_path="<local-path-to-store-your-output-files>"),
)

experiment.search()
best_eval = experiment.results.get_best_evaluations(k=1)[0]
print(best_eval)

print(best_eval.rag_pattern.generate("What ai4rag can be used for?"))

[!tip] For production use, implement your own custom EventHandler to handle status changes and artifacts produced during the experiment. See the BaseEventHandler implementation for reference.

Contribution

Pull requests are very welcome! Make sure your patches are well tested. Ideally create a topic branch for every separate change you make. For example:

  1. Fork the repo
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Added some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create new Pull Request

See more details in contributing section.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ai4rag-0.5.5.tar.gz (69.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ai4rag-0.5.5-py3-none-any.whl (91.3 kB view details)

Uploaded Python 3

File details

Details for the file ai4rag-0.5.5.tar.gz.

File metadata

  • Download URL: ai4rag-0.5.5.tar.gz
  • Upload date:
  • Size: 69.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ai4rag-0.5.5.tar.gz
Algorithm Hash digest
SHA256 efd90cc00b0c18fc0ba14e0d0c632a813685747ceefea1037b5232a98566da6b
MD5 dc16f618290574cf202289db655c5b11
BLAKE2b-256 3e2a08f93f5845a83bceff6b7942aabc96a991acd82c6ad26a6bfbfde29fec57

See more details on using hashes here.

Provenance

The following attestation bundles were made for ai4rag-0.5.5.tar.gz:

Publisher: publish-pypi.yml on IBM/ai4rag

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ai4rag-0.5.5-py3-none-any.whl.

File metadata

  • Download URL: ai4rag-0.5.5-py3-none-any.whl
  • Upload date:
  • Size: 91.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ai4rag-0.5.5-py3-none-any.whl
Algorithm Hash digest
SHA256 0e2aa29017c0ebbfd5cb629ae22ea34918e3874b28be805a62b5ccd43b9fac09
MD5 42edd676f48c6d89d1d78cd25eaa4018
BLAKE2b-256 5e5e95ee89d400b2152b93b782c740ca2640471fe09585ffc9cc43aeef533bfb

See more details on using hashes here.

Provenance

The following attestation bundles were made for ai4rag-0.5.5-py3-none-any.whl:

Publisher: publish-pypi.yml on IBM/ai4rag

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page