A lightweight SDK for benchmarking RAG agents

These details have not been verified by PyPI

Project links

Project description

🔻 Vecta

A lightweight SDK for benchmarking RAG agents.

Vecta helps you improve (and ultimately trust) your RAG (Retrieval-Augmented Generation) agents. Easily evaluate your system against human-made or synthetic benchmarks, grounded on your knowledge base.

The benchmarks are built on the concept of "Full test coverage". Synthetic benchmarks generated by Vecta include multi-hop retrievals, edge cases, and adversarial queries.

Evaluations are done across the chunk, page, and document levels, and can be run on each individual part of the pipeline: retrieval-only, generation-only, or retrieval augmented generation (full RAG pipeline).

What types of evaluations can I measure?

Evaluations can be run at different semantic levels and for different components of your agentic system.

Semantic Level	Retrieval	Generation	Retrieval, Generation
Chunk-level	Recall, Precision	Accuracy, Factuality	Recall, Precision, Accuracy, Factuality
Page-level	Recall, Precision	Accuracy, Factuality	Recall, Precision, Accuracy, Factuality
Document-level	Recall, Precision	Accuracy, Factuality	Recall, Precision, Accuracy, Factuality

Making a benchmark

A benchmark in Vecta is a list of vecta.core.schema.BenchmarkEntry records containing:

a synthetic question
a canonical answer
the set of chunk_ids that can answer it
the page_nums and doc_names where those chunks live

Vecta builds this automatically from your knowledge base by:

Sampling real chunks
Asking an LLM to generate a question that that chunk can answer
Discovering other chunks that could also answer it (via semantic search + an LLM-as-a-judge check).

1) Connect to your vector DB and load the KB

from chromadb import Client
from vecta.connectors.chroma_connector import ChromaConnector
from vecta.core.benchmark import VectaClient

chroma = Client()
collection_name = "my_docs"

# Connect Chroma to Vecta
connector = ChromaConnector(client=chroma, collection_name=collection_name)

# Initialize VectaClient
vecta = VectaClient(
    vector_db_connector=connector, openai_api_key="<YOUR_OPENROUTER_API_KEY>"
)

# Load the knowledge base into Vecta
vecta.load_knowledge_base()

✅ Metadata requirements: each chunk in your vector database must contain a Metadata dictionary containing the keys page_nums: List[int] and doc_name: str.

2) Generate the benchmark

# Create N synthetic Q&A pairs and align them to correct chunks/pages/docs
entries = vecta.generate_benchmark(
    n_questions=10,
    similarity_threshold=0.7,
    similarity_top_k=5,
    random_seed=42,
)

3) Save / Load the benchmark (CSV)

# Save to CSV
vecta.save_benchmark("my_benchmark.csv")

# Later (or in another script), load it back:
vecta.load_benchmark("my_benchmark.csv")

Running an evaluation

Vecta lets you evaluate three things against an existing benchmark:

Retrieval → you provide a function: query: str -> chunk_ids: List[str]
Generation → you provide: query: str -> generated_text: str
Retrieval + Generation → you provide: query: str -> Tuple[chunk_ids: List[str], generated_text: str]

Retrieval-only evaluation

Provide a function that returns the IDs of your retrieved chunks for a given query.

from typing import List


def my_retriever(query: str) -> List[str]:
    top = connector.semantic_search(query_str=query, k=10)

    # return chunk ids
    return [c.id for c in top]

retrieval_results = vecta.evaluate_retrieval(my_retriever, evaluation_name="baseline @ k=10")

Generation-only evaluation

def my_llm_call(query: str) -> str:
    resp = self._client.chat.completions.create(
            model=self.model,
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": query}
            ]
        )
    return resp.choices[0].message.content

gen_only = vecta.evaluate_generation_only(my_llm_call, evaluation_name="my llm call")

Retrieval-augmented Generation (RAG) evaluation

Provide a function that returns both retrieved chunk IDs and your generated answer.

from typing import List, Tuple

def my_rag_pipeline(query: str) -> Tuple[List[str], str]:
    # retrieve
    retrieved = vector_search(query_str=query, k=5)
    chunk_ids = [c.id for c in retrieved]

    # generate
    completion = client.chat.completions.create(
        model="your-model",
        messages=[
            {"role": "user", "content": f"{retrieved}\n{query}"}
        ]
    )
    llm_response = completion.choices[0].message.content

    # must return tuple
    return chunk_ids, llm_response

rag_results = vecta.evaluate_retrieval_and_generation(my_rag_pipeline, evaluation_name="rag @ k=5")

Connecting to custom retrievers

Don't see a connector for your vector db? No problem! Inherit from vecta.connectors.base.BaseVectorDBConnector and correctly define these three functions to connect Vecta to your data retrieval pipeline:

from vecta.connectors.base import BaseVectorDBConnector
from vecta.core.schemas import ChunkData

class CustomConnector(BaseVectorDBConnector):
    def get_all_chunks_and_metadata(self) -> List[ChunkData]:
        return [...]

    def semantic_search(self, query: str, k: int) -> List[ChunkData]:
        return [...]

    def get_chunk_by_id(self, chunk_id: str) -> ChunkData:
        return ...

To implement these functions, you should first familiarize yourself with the vecta.core.schemas.ChunkData class. Every chunk returned to Vecta must include:

id: str , a unique identifier for the chunk
content: str , the text of the chunk
metadata: Dict[str, Any] , must include:
- page_nums: one-indexed page numbers spanned by this chunk
- doc_name: a unique file name within your corpus

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.8

Feb 16, 2026

0.1.7

Feb 14, 2026

0.1.6

Feb 14, 2026

0.1.4

Oct 1, 2025

0.1.3

Oct 1, 2025

0.1.2

Sep 22, 2025

0.1.1

Sep 17, 2025

This version

0.1.0

Sep 17, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vecta-0.1.0.tar.gz (47.7 kB view details)

Uploaded Sep 17, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

vecta-0.1.0-py3-none-any.whl (57.9 kB view details)

Uploaded Sep 17, 2025 Python 3

File details

Details for the file vecta-0.1.0.tar.gz.

File metadata

Download URL: vecta-0.1.0.tar.gz
Upload date: Sep 17, 2025
Size: 47.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.8

File hashes

Hashes for vecta-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`3d3f8da4809855b8e12962096fffccda6fd46c695c866059d83d7974b8a62a9b`
MD5	`50281256cac0744eff29c13ff33af9bb`
BLAKE2b-256	`551ba92fe0fba4b42cb37a97a638c53450a81e83787d748f44aab319f0d421b5`

See more details on using hashes here.

File details

Details for the file vecta-0.1.0-py3-none-any.whl.

File metadata

Download URL: vecta-0.1.0-py3-none-any.whl
Upload date: Sep 17, 2025
Size: 57.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.8

File hashes

Hashes for vecta-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`12ffc3518ff972e6a4a3d529ede724823bc15ea8bfc54d004a2acef292419453`
MD5	`aafc0b55f205c7254c2033a00ca410c5`
BLAKE2b-256	`92684236724a652def5f64b13737016c8588b5c2aba595ae92eab320b916fe90`

See more details on using hashes here.

vecta 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

🔻 Vecta

A lightweight SDK for benchmarking RAG agents.

What types of evaluations can I measure?

Making a benchmark

1) Connect to your vector DB and load the KB

2) Generate the benchmark

3) Save / Load the benchmark (CSV)

Running an evaluation

Retrieval-only evaluation

Generation-only evaluation

Retrieval-augmented Generation (RAG) evaluation

Connecting to custom retrievers

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes