Skip to main content

A lightweight SDK for benchmarking RAG agents

Project description

🔻 Vecta

A lightweight SDK for benchmarking RAG agents.

Vecta helps you improve (and ultimately trust) your RAG (Retrieval-Augmented Generation) agents. Easily evaluate your system against human-made or synthetic benchmarks, grounded on your knowledge base.

The benchmarks are built on the concept of "Full test coverage". Synthetic benchmarks generated by Vecta include multi-hop retrievals, edge cases, and adversarial queries.

Evaluations are done across the chunk, page, and document levels, and can be run on each individual part of the pipeline: retrieval-only, generation-only, or retrieval augmented generation (full RAG pipeline).

What types of evaluations can I measure?

Evaluations can be run at different semantic levels and for different components of your agentic system.

Semantic Level Retrieval Generation Retrieval, Generation
Chunk-level Recall, Precision Accuracy, Factuality Recall, Precision, Accuracy, Factuality
Page-level Recall, Precision Accuracy, Factuality Recall, Precision, Accuracy, Factuality
Document-level Recall, Precision Accuracy, Factuality Recall, Precision, Accuracy, Factuality

Making a benchmark

A benchmark in Vecta is a list of vecta.core.schema.BenchmarkEntry records containing:

  • a synthetic question
  • a canonical answer
  • the set of chunk_ids that can answer it
  • the page_nums and doc_names where those chunks live

Vecta builds this automatically from your knowledge base by:

  1. Sampling real chunks
  2. Asking an LLM to generate a question that that chunk can answer
  3. Discovering other chunks that could also answer it (via semantic search + an LLM-as-a-judge check).

1) Connect to your vector DB and load the KB

from chromadb import Client
from vecta.connectors.chroma_connector import ChromaConnector
from vecta.core.benchmark import VectaClient

chroma = Client()
collection_name = "my_docs"

# Connect Chroma to Vecta
connector = ChromaConnector(client=chroma, collection_name=collection_name)

# Initialize VectaClient
vecta = VectaClient(
    vector_db_connector=connector, openai_api_key="<YOUR_OPENROUTER_API_KEY>"
)

# Load the knowledge base into Vecta
vecta.load_knowledge_base()

Metadata requirements: each chunk in your vector database must contain a Metadata dictionary containing the keys page_nums: List[int] and doc_name: str.

2) Generate the benchmark

# Create N synthetic Q&A pairs and align them to correct chunks/pages/docs
entries = vecta.generate_benchmark(
    n_questions=10,
    similarity_threshold=0.7,
    similarity_top_k=5,
    random_seed=42,
)

3) Save / Load the benchmark (CSV)

# Save to CSV
vecta.save_benchmark("my_benchmark.csv")

# Later (or in another script), load it back:
vecta.load_benchmark("my_benchmark.csv")

Running an evaluation

Vecta lets you evaluate three things against an existing benchmark:

  • Retrieval → you provide a function: query: str -> chunk_ids: List[str]
  • Generation → you provide: query: str -> generated_text: str
  • Retrieval + Generation → you provide: query: str -> Tuple[chunk_ids: List[str], generated_text: str]

Retrieval-only evaluation

Provide a function that returns the IDs of your retrieved chunks for a given query.

from typing import List


def my_retriever(query: str) -> List[str]:
    top = connector.semantic_search(query_str=query, k=10)

    # return chunk ids
    return [c.id for c in top]

retrieval_results = vecta.evaluate_retrieval(my_retriever, evaluation_name="baseline @ k=10")

Generation-only evaluation

def my_llm_call(query: str) -> str:
    resp = self._client.chat.completions.create(
            model=self.model,
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": query}
            ]
        )
    return resp.choices[0].message.content

gen_only = vecta.evaluate_generation_only(my_llm_call, evaluation_name="my llm call")

Retrieval-augmented Generation (RAG) evaluation

Provide a function that returns both retrieved chunk IDs and your generated answer.

from typing import List, Tuple

def my_rag_pipeline(query: str) -> Tuple[List[str], str]:
    # retrieve
    retrieved = vector_search(query_str=query, k=5)
    chunk_ids = [c.id for c in retrieved]

    # generate
    completion = client.chat.completions.create(
        model="your-model",
        messages=[
            {"role": "user", "content": f"{retrieved}\n{query}"}
        ]
    )
    llm_response = completion.choices[0].message.content

    # must return tuple
    return chunk_ids, llm_response

rag_results = vecta.evaluate_retrieval_and_generation(my_rag_pipeline, evaluation_name="rag @ k=5")

Connecting to custom retrievers

Don't see a connector for your vector db? No problem! Inherit from vecta.connectors.base.BaseVectorDBConnector and correctly define these three functions to connect Vecta to your data retrieval pipeline:

from vecta.connectors.base import BaseVectorDBConnector
from vecta.core.schemas import ChunkData

class CustomConnector(BaseVectorDBConnector):
    def get_all_chunks_and_metadata(self) -> List[ChunkData]:
        return [...]

    def semantic_search(self, query: str, k: int) -> List[ChunkData]:
        return [...]

    def get_chunk_by_id(self, chunk_id: str) -> ChunkData:
        return ...

To implement these functions, you should first familiarize yourself with the vecta.core.schemas.ChunkData class. Every chunk returned to Vecta must include:

  • id: str , a unique identifier for the chunk
  • content: str , the text of the chunk
  • metadata: Dict[str, Any] , must include:
    • page_nums: one-indexed page numbers spanned by this chunk
    • doc_name: a unique file name within your corpus

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vecta-0.1.0.tar.gz (47.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vecta-0.1.0-py3-none-any.whl (57.9 kB view details)

Uploaded Python 3

File details

Details for the file vecta-0.1.0.tar.gz.

File metadata

  • Download URL: vecta-0.1.0.tar.gz
  • Upload date:
  • Size: 47.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.8

File hashes

Hashes for vecta-0.1.0.tar.gz
Algorithm Hash digest
SHA256 3d3f8da4809855b8e12962096fffccda6fd46c695c866059d83d7974b8a62a9b
MD5 50281256cac0744eff29c13ff33af9bb
BLAKE2b-256 551ba92fe0fba4b42cb37a97a638c53450a81e83787d748f44aab319f0d421b5

See more details on using hashes here.

File details

Details for the file vecta-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: vecta-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 57.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.8

File hashes

Hashes for vecta-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 12ffc3518ff972e6a4a3d529ede724823bc15ea8bfc54d004a2acef292419453
MD5 aafc0b55f205c7254c2033a00ca410c5
BLAKE2b-256 92684236724a652def5f64b13737016c8588b5c2aba595ae92eab320b916fe90

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page