A lightweight SDK for benchmarking RAG agents
Project description
🔻 Vecta
A lightweight SDK for benchmarking RAG agents.
Vecta helps you improve (and ultimately trust) your RAG (Retrieval-Augmented Generation) agents. Easily evaluate your system against human-made or synthetic benchmarks, grounded on your knowledge base. You can also bootstrap evaluations from well-known public datasets without having to write custom import scripts.
The benchmarks are built on the concept of "Full test coverage". Synthetic benchmarks generated by Vecta include multi-hop retrievals, edge cases, and adversarial queries.
Evaluations are done across the chunk, page, and document levels, and can be run on each individual part of the pipeline: retrieval-only, generation-only, or retrieval augmented generation (full RAG pipeline).
What types of evaluations can I measure?
Evaluations can be run at different semantic levels and for different components of your agentic system.
| Semantic Level | Retrieval | Generation | Retrieval, Generation |
|---|---|---|---|
| Chunk-level | Recall, Precision | Accuracy, Groundedness | Recall, Precision, Accuracy, Groundedness |
| Page-level | Recall, Precision | Accuracy, Groundedness | Recall, Precision, Accuracy, Groundedness |
| Document-level | Recall, Precision | Accuracy, Groundedness | Recall, Precision, Accuracy, Groundedness |
Making a benchmark
A benchmark in Vecta is a list of vecta.core.schema.BenchmarkEntry records containing:
- a synthetic question
- a canonical answer
- the set of chunk_ids that can answer it
- the page_nums and doc_names where those chunks live
Vecta builds this automatically from your knowledge base by:
- Sampling real chunks
- Asking an LLM to generate a question that that chunk can answer
- Discovering other chunks that could also answer it (via semantic search + an LLM-as-a-judge check).
1) Connect to your vector DB and load the KB
Every connector in the SDK expects a schema that tells Vecta how to pull fields such as id, content, source_path, and page_nums from the raw results returned by your vector database.
Choosing a schema template
- Use the helpers in
vecta.core.schema_helpers.SchemaTemplatesfor popular databases (Chroma, Weaviate, Pinecone, LanceDB, etc.). - Pass the schema instance to your connector. Each helper documents the required metadata fields so you can match them with the way your data is stored.
from vecta.core.schema_helpers import SchemaTemplates
# Example: for Chroma collections with the default metadata structure
schema = SchemaTemplates.chroma_default()
Creating a custom schema
Define a VectorDBSchema manually when your database returns non-standard field names or nested metadata:
from vecta.core.schemas import VectorDBSchema
custom_schema = VectorDBSchema(
id_accessor="chunk_id",
content_accessor="payload.document_text",
source_path_accessor="metadata.source",
page_nums_accessor="json(metadata.provenance).pages",
)
Accessor strings support dotted paths, array indexes (e.g., "chunks[0].id"), and json() traversal for nested JSON structures.
💡 Tip: When building a schema, log or inspect one record from your vector DB client so you can map each field directly to a schema accessor.
from chromadb import Client
from vecta.connectors.chroma_local_connector import ChromaLocalConnector
from vecta.core.benchmark import VectaClient
from vecta.core.schema_helpers import SchemaTemplates
chroma = Client()
collection_name = "my_docs"
# Define schema for your data structure
schema = SchemaTemplates.chroma_default()
# Connect Chroma to Vecta
connector = ChromaLocalConnector(
client=chroma,
collection_name=collection_name,
schema=schema
)
# Initialize VectaClient
vecta = VectaClient(
vector_db_connector=connector,
openai_api_key="<YOUR_OPENROUTER_API_KEY>"
)
# Load the knowledge base into Vecta
vecta.load_knowledge_base()
✅ Schema requirements: Each connector requires a schema that defines how to extract
id,content,source_pathandpage_numsfrom your data. Use our schema helpers or create custom ones with syntax like"metadata.source_path"or"json(metadata.provenance).doc_name".
2) Generate the benchmark
# Create N synthetic Q&A pairs and align them to correct chunks/pages/docs
entries = vecta.generate_benchmark(
n_questions=10,
similarity_threshold=0.7,
similarity_top_k=5,
random_seed=42,
)
3) Save / Load the benchmark (CSV)
# Save to CSV
vecta.save_benchmark("my_benchmark.csv")
# Later (or in another script), load it back:
vecta.load_benchmark("my_benchmark.csv")
Running an evaluation
Vecta lets you evaluate three things against an existing benchmark:
- Retrieval → you provide a function:
query: str -> chunk_ids: List[str] - Generation → you provide:
query: str -> generated_text: str - Retrieval + Generation → you provide:
query: str -> Tuple[chunk_ids: List[str], generated_text: str]
Retrieval-only evaluation
Provide a function that returns the IDs of your retrieved chunks for a given query.
from typing import List
def my_retriever(query: str) -> List[str]:
top = connector.semantic_search(query_str=query, k=10)
# return chunk ids
return [c.id for c in top]
retrieval_results = vecta.evaluate_retrieval(my_retriever, evaluation_name="baseline @ k=10")
Generation-only evaluation
def my_llm_call(query: str) -> str:
resp = self._client.chat.completions.create(
model=self.model,
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": query}
]
)
return resp.choices[0].message.content
gen_only = vecta.evaluate_generation_only(my_llm_call, evaluation_name="my llm call")
Retrieval-augmented Generation (RAG) evaluation
Provide a function that returns both retrieved chunk IDs and your generated answer.
from typing import List, Tuple
def my_rag_pipeline(query: str) -> Tuple[List[str], str]:
# retrieve
retrieved = vector_search(query_str=query, k=5)
chunk_ids = [c.id for c in retrieved]
# generate
completion = client.chat.completions.create(
model="your-model",
messages=[
{"role": "user", "content": f"{retrieved}\n{query}"}
]
)
llm_response = completion.choices[0].message.content
# must return tuple
return chunk_ids, llm_response
rag_results = vecta.evaluate_retrieval_and_generation(my_rag_pipeline, evaluation_name="rag @ k=5")
Connecting to custom databases
Don't see a connector for your vector db? No problem!
Inherit from vecta.connectors.base.BaseVectorDBConnector and define these three functions with a schema:
from vecta.connectors.base import BaseVectorDBConnector
from vecta.core.schemas import ChunkData, VectorDBSchema
# Define how to extract data from your database results
custom_schema = VectorDBSchema(
id_accessor="id", # Direct field access
content_accessor="document", # Field containing text
source_path_accessor="metadata.source_path", # Nested field access
page_nums_accessor="json(metadata.provenance).page_nums", # JSON parsing
)
class CustomConnector(BaseVectorDBConnector):
def __init__(self, your_db_client, schema: VectorDBSchema):
super().__init__(schema)
self.db = your_db_client
def get_all_chunks(self) -> List[ChunkData]:
results = self.db.get_all()
return [self._create_chunk_data_from_raw(r) for r in results]
def semantic_search(self, query: str, k: int) -> List[ChunkData]:
results = self.db.search(query, limit=k)
return [self._create_chunk_data_from_raw(r) for r in results]
def get_chunk_by_id(self, chunk_id: str) -> ChunkData:
result = self.db.get_by_id(chunk_id)
return self._create_chunk_data_from_raw(result)
Schema accessor syntax: Use "field", "metadata.nested_field", "[0]" for arrays, "json(field).subfield" for JSON parsing, or "json(json(field).sub).final" for nested JSON.
Importing existing datasets
Vecta ships with dataset importers so you can start from curated retrieval or generation benchmarks instead of generating your own from scratch. Import popular evaluation datasets like GPQA Diamond or MS MARCO:
from vecta.core.dataset_importer import BenchmarkDatasetImporter
importer = BenchmarkDatasetImporter()
# Import GPQA Diamond for generation-only evaluation
chunks, benchmark_entries = importer.import_gpqa_diamond(split="train", max_items=50)
# Import MS MARCO for retrieval + generation evaluation
chunks, benchmark_entries = importer.import_msmarco(split="test", max_items=100)
# Use with VectaClient
vecta = VectaClient()
vecta.benchmark_entries = benchmark_entries
The importer handles dataset schema mapping automatically, converting various field structures into Vecta's standardized format. You can combine imported datasets with your own knowledge-base-derived benchmarks to compare performance across synthetic and real-world tasks.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vecta-0.1.2.tar.gz.
File metadata
- Download URL: vecta-0.1.2.tar.gz
- Upload date:
- Size: 50.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
74518d3bdf48315e3c6b4a57d4b2b3bf6ec8d3ebad5b4d1a0506acdd5d316cfa
|
|
| MD5 |
cf953fd0bb7dc73dd517b2d21f3465f1
|
|
| BLAKE2b-256 |
700d9eeaa8020f5bb02be46725273814ba4900f1fb3893ee785d3f723a11cbd8
|
File details
Details for the file vecta-0.1.2-py3-none-any.whl.
File metadata
- Download URL: vecta-0.1.2-py3-none-any.whl
- Upload date:
- Size: 59.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3f346a9a5d0eee7baac98ccf9dcce524c884673cfdf929bbd8a9f79e89206007
|
|
| MD5 |
c1660575525f91091ba8eefc72f3294a
|
|
| BLAKE2b-256 |
9b3a6e8f424f2d62eb27199916f50d37b35b79fbe25452c62604cb9b3ad2ed0c
|