Automatic and optimized RAG Pattern generator

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

jakubw MSteczko

These details have not been verified by PyPI

Project description

`ai4rag`

RAG Templates Optimization Engine

AI4RAG Python Python

Initializes RAG Templates with optimal parameters

Getting Started • User Guide • API Reference • Development

🎯 What is ai4RAG?

ai4RAG is an optimization engine for RAG Templates that is LLM and vector database provider-agnostic. It accepts a variety of RAG Templates and a search space definition, then returns an initialized RAG Template with optimal parameter values (called a RAG Pattern).

[!IMPORTANT] ai4rag is designed to be provider-agnostic: user may provide his own implementation for foundation model, embedding model or vector store and use them for the experiment. Out of the box ai4rag is designed to work with OGX. To use the full capabilities of ai4rag, you'll need access to an OGX server configured with at least one foundation model, one embedding model, and a vector database.

OGX

ai4RAG can run experiments using an OGX server for embeddings, vector storage, and text generation. Use the official client and API docs to connect and extend:

Client: ogx-client >= 1.1.0 (Python package used by ai4RAG; installs with this project).
Server: OGX >= 1.1.0.
API reference: OGX API docs — HTTP API used by the client.

Features used by ai4rag

When using the OGX backend, ai4rag relies on:

Embeddings — Text embeddings via the client (e.g. for indexing and query encoding). See Embeddings API in the docs.
Vector stores — Create, retrieve, and delete vector store instances (e.g. Milvus) with a chosen embedding model and dimension. See Vector stores in the API docs.
Vector IO — Insert document chunks (with embeddings) into a store and run similarity search (query) for retrieval. See Vector IO and insert/query endpoints.
Chat / responses — Foundation model integration for answer generation (e.g. chat completions or responses API) when evaluating RAG patterns.

Document processing

ai4RAG uses docling-core for document representation and chunking. Documents are represented as DoclingDocument instances, and the DoclingChunker leverages docling's HybridChunker for structure-aware, token-aware chunking. Both docling-core and ogx-client are installed automatically with ai4rag.

Quick start

Provide an instance of ogx-client to integrate with OGX.
Prepare your knowledge base documents for the experiment.
Prepare benchmark_data.json with evaluation questions and answers.
Define and constrain your search space.
Configure the optimizer.
Create and run the experiment.

Prepare `ogx-client`

To enable full integration with OGX, instantiate an OgxClient. This allows ai4rag to use the models and vector stores available on your OGX server.

[!tip] Store your credentials securely in a .env file.

import os
from dotenv import load_dotenv, find_dotenv
from ogx_client import OgxClient

client = OgxClient(base_url=os.getenv("BASE_URL"), api_key=os.getenv("API_KEY"))

Prepare knowledge base documents

Prepare a set of documents to serve as the knowledge base for retrieval. Documents are represented as DoclingDocument instances (from the docling-core library) and should be stored in a local directory.

[!note] If you are using the project locally, you can load documents using the FileStore class from the dev_utils module. Supported document formats can be found in the FileStore implementation.

from pathlib import Path
from dev_utils.file_store import FileStore

documents_path = Path("<path to the documents folder>")
documents = FileStore(documents_path).load_as_documents()

Prepare `benchmark_data.json`

Create a benchmark_data.json file following this schema:

[
	{
		"question": "<question_1>",
		"correct_answers": [
			"<answer 1 for question 1>",
			"<answer 2 for question 1>"
		],
		"correct_answer_document_ids": ["<list of documents ids based on which correct answers were generated>"]
	},
	{
		"question": "<question_2>",
		"correct_answers": [
			"<answer 1 for question 2>",
			"<answer 2 for question 2>"
		],
		"correct_answer_document_ids": ["<list of documents ids based on which correct answers were generated>"]
	}
]

All benchmark questions and answers must be derived from your knowledge base documents.

from dev_utils.utils import read_benchmark_from_json

benchmark_data_path = Path("<path to benchmark_data.json>")
benchmark_data = read_benchmark_from_json(benchmark_data_path)

Define and constrain search space

The search space defines all possible parameter combinations, where each combination creates a unique RAG Pattern. During the experiment, the engine will optimize the RAG Pattern for the selected metric over the given search space, using an objective function to evaluate each configuration.

from ai4rag.search_space.src.parameter import Parameter
from ai4rag.search_space.src.search_space import AI4RAGSearchSpace
from ai4rag.rag.foundation_models.ogx import OGXFoundationModel
from ai4rag.rag.embedding.ogx import OGXEmbeddingModel


search_space = AI4RAGSearchSpace(
    params=[
        Parameter(
            name="foundation_model",
            param_type="C",
            values=[OGXFoundationModel(model_id="ollama/llama3.2:3b", client=client)],
        ),
        Parameter(
            name="embedding_model",
            param_type="C",
            values=[
                OGXEmbeddingModel(
                    model_id="ollama/nomic-embed-text:latest",
                    client=client,
                    params={"embedding_dimension": 768, "context_length": 8192},
                )
            ],
        ),
        Parameter(
            name="chunking_method",
            param_type="C",
            values=["recursive", "hybrid"],
        ),
        Parameter(
            name="chunk_size",
            param_type="C",
            values=[512, 1024, 2048],
        ),
        Parameter(
            name="chunk_overlap",
            param_type="C",
            values=[0, 128, 256],
        ),
    ]
)

[!tip] chunking_method controls the chunking strategy: "recursive" uses LangChain's RecursiveCharacterTextSplitter, while "hybrid" uses docling's structure-aware HybridChunker (requires chunk_overlap=0). When omitted, both methods are included by default.

[!tip] To run automatic models discovery with OGX you may use prepare_search_space_with_ogx() from ai4rag.search_space.prepare_search_space.

Configure optimizer

You have full control over the optimization algorithm. Configure the GAMOptimizer by adjusting GAMOptSettings.

from ai4rag.core.hpo.gam_opt import GAMOptSettings

optimizer_settings = GAMOptSettings(
    max_evals=10, n_random_nodes=4
)

Run the experiment

Using the information from the previous steps, create an experiment and run the ai4rag optimization engine.

[!note] For OGX vector stores, use vector_store_type="ogx" and specify the provider with ogx_vector_io_provider_id (e.g., ogx_vector_io_provider_id="milvus", ogx_vector_io_provider_id="qdrant"). To use ChromaDB in-memory, specify vector_store_type="chroma".

from ai4rag.core.experiment.experiment import AI4RAGExperiment
from ai4rag.utils.event_handler import LocalEventHandler

experiment = AI4RAGExperiment(
    client=client,
    documents=documents,
    benchmark_data=benchmark_data,
    search_space=search_space,
    vector_store_type="ogx",
    ogx_vector_io_provider_id="milvus",
    optimizer_settings=optimizer_settings,
    event_handler=LocalEventHandler(output_path="<local-path-to-store-your-output-files>"),
)

experiment.search()
best_eval = experiment.results.get_best_evaluations(k=1)[0]
print(best_eval)

print(best_eval.rag_pattern.generate("What ai4rag can be used for?"))

[!tip] For production use, implement your own custom EventHandler to handle status changes and artifacts produced during the experiment. See the BaseEventHandler implementation for reference.

Contribution

Pull requests are very welcome! Make sure your patches are well tested. Ideally create a topic branch for every separate change you make.

Development setup

This project uses uv for dependency management.

# Clone the repository
git clone https://github.com/IBM/ai4rag.git
cd ai4rag

# Install all development dependencies
uv sync --extra dev

# Run tests
uv run pytest tests/unit/

# Check code style
uv run black --check ai4rag/
uv run pylint ai4rag/

# Build and serve documentation locally
uv run mkdocs serve

Pull request workflow

Fork the repo
Create your feature branch (git checkout -b my-new-feature)
Commit your changes (git commit -s -am 'Added some feature')
Push to the branch (git push origin my-new-feature)
Create new Pull Request

See more details in contributing section.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

jakubw MSteczko

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.9.1

Jul 3, 2026

0.9.0

Jul 3, 2026

0.8.1

Jun 24, 2026

0.8.0

Jun 24, 2026

0.7.0

Jun 11, 2026

0.6.4

Jun 17, 2026

0.6.3

May 29, 2026

0.6.2

May 28, 2026

0.6.1

May 13, 2026

0.6.0

May 8, 2026

0.5.5

Apr 30, 2026

0.5.4

Apr 14, 2026

0.5.3

Apr 9, 2026

0.5.2

Mar 31, 2026

0.5.1

Mar 27, 2026

0.5.0

Mar 19, 2026

0.4.0

Mar 10, 2026

0.2.1

Feb 26, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ai4rag-0.9.1.tar.gz (115.9 kB view details)

Uploaded Jul 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ai4rag-0.9.1-py3-none-any.whl (151.9 kB view details)

Uploaded Jul 3, 2026 Python 3

File details

Details for the file ai4rag-0.9.1.tar.gz.

File metadata

Download URL: ai4rag-0.9.1.tar.gz
Upload date: Jul 3, 2026
Size: 115.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ai4rag-0.9.1.tar.gz
Algorithm	Hash digest
SHA256	`04c30f5f1cd90ec3f4f99e112866915f483f8877772dff5891b3eaded4759c2f`
MD5	`61572ce3883735e841940abd56488424`
BLAKE2b-256	`fb0889e0d79a4dbe6daefb33d0f9dad203e2c06c143f36f65d0395222a9abfbf`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ai4rag-0.9.1.tar.gz:

Publisher: publish-pypi.yml on IBM/ai4rag

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ai4rag-0.9.1.tar.gz
- Subject digest: 04c30f5f1cd90ec3f4f99e112866915f483f8877772dff5891b3eaded4759c2f
- Sigstore transparency entry: 2059798559
- Sigstore integration time: Jul 3, 2026
Source repository:
- Permalink: IBM/ai4rag@f87a6402426f4f4fcad133af25c33626832dd042
- Branch / Tag: refs/heads/main
- Owner: https://github.com/IBM
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-pypi.yml@f87a6402426f4f4fcad133af25c33626832dd042
- Trigger Event: workflow_dispatch

File details

Details for the file ai4rag-0.9.1-py3-none-any.whl.

File metadata

Download URL: ai4rag-0.9.1-py3-none-any.whl
Upload date: Jul 3, 2026
Size: 151.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ai4rag-0.9.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f8ee7bc3b83b53702046e8f1038c0d6933003e3c0797d4112f57b4b036b87a6f`
MD5	`5d79f2067f6368a4ea0fdc854d18b6e3`
BLAKE2b-256	`72fb8459523eaa59f01aaed7e7b791fccb7f38c80f84354d8ca3df097dda41be`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ai4rag-0.9.1-py3-none-any.whl:

Publisher: publish-pypi.yml on IBM/ai4rag

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ai4rag-0.9.1-py3-none-any.whl
- Subject digest: f8ee7bc3b83b53702046e8f1038c0d6933003e3c0797d4112f57b4b036b87a6f
- Sigstore transparency entry: 2059798886
- Sigstore integration time: Jul 3, 2026
Source repository:
- Permalink: IBM/ai4rag@f87a6402426f4f4fcad133af25c33626832dd042
- Branch / Tag: refs/heads/main
- Owner: https://github.com/IBM
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-pypi.yml@f87a6402426f4f4fcad133af25c33626832dd042
- Trigger Event: workflow_dispatch

ai4rag 0.9.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Classifiers

Project description

ai4rag

RAG Templates Optimization Engine

🎯 What is ai4RAG?

OGX

Document processing

Quick start

Prepare ogx-client

Prepare knowledge base documents

Prepare benchmark_data.json

Define and constrain search space

Configure optimizer

Run the experiment

Contribution

Development setup

Pull request workflow

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`ai4rag`

Prepare `ogx-client`

Prepare `benchmark_data.json`