A lightweight SDK for benchmarking RAG agents
Project description
🔻 Vecta
A lightweight SDK for benchmarking RAG agents.
Vecta helps you improve (and ultimately trust) your RAG (Retrieval-Augmented Generation) agents. Easily evaluate your system against human-made or synthetic benchmarks, grounded on your knowledge base.
The benchmarks are built on the concept of "Full test coverage". Synthetic benchmarks generated by Vecta include multi-hop retrievals, edge cases, and adversarial queries.
Evaluations are done across the chunk, page, and document levels, and can be run on each individual part of the pipeline: retrieval-only, generation-only, or retrieval augmented generation (full RAG pipeline).
What types of evaluations can I measure?
Evaluations can be run at different semantic levels and for different components of your agentic system.
| Semantic Level | Retrieval | Generation | Retrieval, Generation |
|---|---|---|---|
| Chunk-level | Recall, Precision | Accuracy, Factuality | Recall, Precision, Accuracy, Factuality |
| Page-level | Recall, Precision | Accuracy, Factuality | Recall, Precision, Accuracy, Factuality |
| Document-level | Recall, Precision | Accuracy, Factuality | Recall, Precision, Accuracy, Factuality |
Making a benchmark
A benchmark in Vecta is a list of vecta.core.schema.BenchmarkEntry records containing:
- a synthetic question
- a canonical answer
- the set of chunk_ids that can answer it
- the page_nums and doc_names where those chunks live
Vecta builds this automatically from your knowledge base by:
- Sampling real chunks
- Asking an LLM to generate a question that that chunk can answer
- Discovering other chunks that could also answer it (via semantic search + an LLM-as-a-judge check).
1) Connect to your vector DB and load the KB
from chromadb import Client
from vecta.connectors.chroma_connector import ChromaConnector
from vecta.core.benchmark import VectaClient
chroma = Client()
collection_name = "my_docs"
# Connect Chroma to Vecta
connector = ChromaConnector(client=chroma, collection_name=collection_name)
# Initialize VectaClient
vecta = VectaClient(
vector_db_connector=connector, openai_api_key="<YOUR_OPENROUTER_API_KEY>"
)
# Load the knowledge base into Vecta
vecta.load_knowledge_base()
✅ Metadata requirements: each chunk in your vector database must contain a Metadata dictionary containing the keys
page_nums: List[int]anddoc_name: str.
2) Generate the benchmark
# Create N synthetic Q&A pairs and align them to correct chunks/pages/docs
entries = vecta.generate_benchmark(
n_questions=10,
similarity_threshold=0.7,
similarity_top_k=5,
random_seed=42,
)
3) Save / Load the benchmark (CSV)
# Save to CSV
vecta.save_benchmark("my_benchmark.csv")
# Later (or in another script), load it back:
vecta.load_benchmark("my_benchmark.csv")
Running an evaluation
Vecta lets you evaluate three things against an existing benchmark:
- Retrieval → you provide a function:
query: str -> chunk_ids: List[str] - Generation → you provide:
query: str -> generated_text: str - Retrieval + Generation → you provide:
query: str -> Tuple[chunk_ids: List[str], generated_text: str]
Retrieval-only evaluation
Provide a function that returns the IDs of your retrieved chunks for a given query.
from typing import List
def my_retriever(query: str) -> List[str]:
top = connector.semantic_search(query_str=query, k=10)
# return chunk ids
return [c.id for c in top]
retrieval_results = vecta.evaluate_retrieval(my_retriever, evaluation_name="baseline @ k=10")
Generation-only evaluation
def my_llm_call(query: str) -> str:
resp = self._client.chat.completions.create(
model=self.model,
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": query}
]
)
return resp.choices[0].message.content
gen_only = vecta.evaluate_generation_only(my_llm_call, evaluation_name="my llm call")
Retrieval-augmented Generation (RAG) evaluation
Provide a function that returns both retrieved chunk IDs and your generated answer.
from typing import List, Tuple
def my_rag_pipeline(query: str) -> Tuple[List[str], str]:
# retrieve
retrieved = vector_search(query_str=query, k=5)
chunk_ids = [c.id for c in retrieved]
# generate
completion = client.chat.completions.create(
model="your-model",
messages=[
{"role": "user", "content": f"{retrieved}\n{query}"}
]
)
llm_response = completion.choices[0].message.content
# must return tuple
return chunk_ids, llm_response
rag_results = vecta.evaluate_retrieval_and_generation(my_rag_pipeline, evaluation_name="rag @ k=5")
Connecting to custom retrievers
Don't see a connector for your vector db? No problem!
Inherit from vecta.connectors.base.BaseVectorDBConnector and correctly define these three functions to connect Vecta to your data retrieval pipeline:
from vecta.connectors.base import BaseVectorDBConnector
from vecta.core.schemas import ChunkData
class CustomConnector(BaseVectorDBConnector):
def get_all_chunks_and_metadata(self) -> List[ChunkData]:
return [...]
def semantic_search(self, query: str, k: int) -> List[ChunkData]:
return [...]
def get_chunk_by_id(self, chunk_id: str) -> ChunkData:
return ...
To implement these functions, you should first familiarize yourself with the vecta.core.schemas.ChunkData class.
Every chunk returned to Vecta must include:
id: str, a unique identifier for the chunkcontent: str, the text of the chunkmetadata: Dict[str, Any], must include:page_nums: one-indexed page numbers spanned by this chunkdoc_name: a unique file name within your corpus
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vecta-0.1.0.tar.gz.
File metadata
- Download URL: vecta-0.1.0.tar.gz
- Upload date:
- Size: 47.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3d3f8da4809855b8e12962096fffccda6fd46c695c866059d83d7974b8a62a9b
|
|
| MD5 |
50281256cac0744eff29c13ff33af9bb
|
|
| BLAKE2b-256 |
551ba92fe0fba4b42cb37a97a638c53450a81e83787d748f44aab319f0d421b5
|
File details
Details for the file vecta-0.1.0-py3-none-any.whl.
File metadata
- Download URL: vecta-0.1.0-py3-none-any.whl
- Upload date:
- Size: 57.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
12ffc3518ff972e6a4a3d529ede724823bc15ea8bfc54d004a2acef292419453
|
|
| MD5 |
aafc0b55f205c7254c2033a00ca410c5
|
|
| BLAKE2b-256 |
92684236724a652def5f64b13737016c8588b5c2aba595ae92eab320b916fe90
|