Skip to main content

A lightweight SDK for benchmarking RAG agents

Project description

🔻 Vecta

A lightweight SDK for benchmarking RAG agents.

Vecta helps you improve (and ultimately trust) your RAG (Retrieval-Augmented Generation) agents. Easily evaluate your system against human-made or synthetic benchmarks, grounded on your knowledge base.

The benchmarks are built on the concept of "Full test coverage". Synthetic benchmarks generated by Vecta include multi-hop retrievals, edge cases, and adversarial queries.

Evaluations are done across the chunk, page, and document levels, and can be run on each individual part of the pipeline: retrieval-only, generation-only, or retrieval augmented generation (full RAG pipeline).

What types of evaluations can I measure?

Evaluations can be run at different semantic levels and for different components of your agentic system.

Semantic Level Retrieval Generation Retrieval, Generation
Chunk-level Recall, Precision Accuracy, Factuality Recall, Precision, Accuracy, Factuality
Page-level Recall, Precision Accuracy, Factuality Recall, Precision, Accuracy, Factuality
Document-level Recall, Precision Accuracy, Factuality Recall, Precision, Accuracy, Factuality

Making a benchmark

A benchmark in Vecta is a list of vecta.core.schema.BenchmarkEntry records containing:

  • a synthetic question
  • a canonical answer
  • the set of chunk_ids that can answer it
  • the page_nums and doc_names where those chunks live

Vecta builds this automatically from your knowledge base by:

  1. Sampling real chunks
  2. Asking an LLM to generate a question that that chunk can answer
  3. Discovering other chunks that could also answer it (via semantic search + an LLM-as-a-judge check).

1) Connect to your vector DB and load the KB

from chromadb import Client
from vecta.connectors.chroma_local_connector import ChromaLocalConnector
from vecta.core.benchmark import VectaClient
from vecta.core.schema_helpers import SchemaTemplates

chroma = Client()
collection_name = "my_docs"

# Define schema for your data structure
schema = SchemaTemplates.chroma_default()

# Connect Chroma to Vecta
connector = ChromaLocalConnector(
    client=chroma,
    collection_name=collection_name,
    schema=schema
)

# Initialize VectaClient
vecta = VectaClient(
    vector_db_connector=connector,
    openai_api_key="<YOUR_OPENROUTER_API_KEY>"
)

# Load the knowledge base into Vecta
vecta.load_knowledge_base()

Schema requirements: Each connector requires a schema that defines how to extract id, content, source_path and page_nums from your data. Use our schema helpers or create custom ones with syntax like "metadata.source_path" or "json(metadata.provenance).doc_name".

2) Generate the benchmark

# Create N synthetic Q&A pairs and align them to correct chunks/pages/docs
entries = vecta.generate_benchmark(
    n_questions=10,
    similarity_threshold=0.7,
    similarity_top_k=5,
    random_seed=42,
)

3) Save / Load the benchmark (CSV)

# Save to CSV
vecta.save_benchmark("my_benchmark.csv")

# Later (or in another script), load it back:
vecta.load_benchmark("my_benchmark.csv")

Running an evaluation

Vecta lets you evaluate three things against an existing benchmark:

  • Retrieval → you provide a function: query: str -> chunk_ids: List[str]
  • Generation → you provide: query: str -> generated_text: str
  • Retrieval + Generation → you provide: query: str -> Tuple[chunk_ids: List[str], generated_text: str]

Retrieval-only evaluation

Provide a function that returns the IDs of your retrieved chunks for a given query.

from typing import List

def my_retriever(query: str) -> List[str]:
    top = connector.semantic_search(query_str=query, k=10)
    # return chunk ids
    return [c.id for c in top]

retrieval_results = vecta.evaluate_retrieval(my_retriever, evaluation_name="baseline @ k=10")

Generation-only evaluation

def my_llm_call(query: str) -> str:
    resp = self._client.chat.completions.create(
            model=self.model,
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": query}
            ]
        )
    return resp.choices[0].message.content

gen_only = vecta.evaluate_generation_only(my_llm_call, evaluation_name="my llm call")

Retrieval-augmented Generation (RAG) evaluation

Provide a function that returns both retrieved chunk IDs and your generated answer.

from typing import List, Tuple

def my_rag_pipeline(query: str) -> Tuple[List[str], str]:
    # retrieve
    retrieved = vector_search(query_str=query, k=5)
    chunk_ids = [c.id for c in retrieved]

    # generate
    completion = client.chat.completions.create(
        model="your-model",
        messages=[
            {"role": "user", "content": f"{retrieved}\n{query}"}
        ]
    )
    llm_response = completion.choices[0].message.content

    # must return tuple
    return chunk_ids, llm_response

rag_results = vecta.evaluate_retrieval_and_generation(my_rag_pipeline, evaluation_name="rag @ k=5")

Connecting to custom databases

Don't see a connector for your vector db? No problem! Inherit from vecta.connectors.base.BaseVectorDBConnector and define these three functions with a schema:

from vecta.connectors.base import BaseVectorDBConnector
from vecta.core.schemas import ChunkData, VectorDBSchema

# Define how to extract data from your database results
custom_schema = VectorDBSchema(
    id_accessor="id",  # Direct field access
    content_accessor="document",  # Field containing text
    source_path_accessor="metadata.source_path",  # Nested field access
    page_nums_accessor="json(metadata.provenance).page_nums",  # JSON parsing
)

class CustomConnector(BaseVectorDBConnector):
    def __init__(self, your_db_client, schema: VectorDBSchema):
        super().__init__(schema)
        self.db = your_db_client

    def get_all_chunks(self) -> List[ChunkData]:
        results = self.db.get_all()
        return [self._create_chunk_data_from_raw(r) for r in results]

    def semantic_search(self, query: str, k: int) -> List[ChunkData]:
        results = self.db.search(query, limit=k)
        return [self._create_chunk_data_from_raw(r) for r in results]

    def get_chunk_by_id(self, chunk_id: str) -> ChunkData:
        result = self.db.get_by_id(chunk_id)
        return self._create_chunk_data_from_raw(result)

Schema accessor syntax: Use "field", "metadata.nested_field", "[0]" for arrays, "json(field).subfield" for JSON parsing, or "json(json(field).sub).final" for nested JSON.

Importing existing datasets

Import popular evaluation datasets like GPQA Diamond or MS MARCO:

from vecta.core.dataset_importer import BenchmarkDatasetImporter

importer = BenchmarkDatasetImporter()

# Import GPQA Diamond for generation-only evaluation
chunks, benchmark_entries = importer.import_gpqa_diamond(split="train", max_items=50)

# Import MS MARCO for retrieval + generation evaluation
chunks, benchmark_entries = importer.import_msmarco(split="test", max_items=100)

# Use with VectaClient
vecta = VectaClient()
vecta.benchmark_entries = benchmark_entries

The importer handles dataset schema mapping automatically, converting various field structures into Vecta's standardized format.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vecta-0.1.1.tar.gz (48.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vecta-0.1.1-py3-none-any.whl (58.4 kB view details)

Uploaded Python 3

File details

Details for the file vecta-0.1.1.tar.gz.

File metadata

  • Download URL: vecta-0.1.1.tar.gz
  • Upload date:
  • Size: 48.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.8

File hashes

Hashes for vecta-0.1.1.tar.gz
Algorithm Hash digest
SHA256 dade54862ebe73145eb43e94e5201e576254c4b0878e918a893a3fd9f8e21182
MD5 b3a8cccdaab81325b77ecd0f0c77d78d
BLAKE2b-256 5c42d35b7029ba9acb3974d58decf307ff06414edee58916b33b31de71443ae1

See more details on using hashes here.

File details

Details for the file vecta-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: vecta-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 58.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.8

File hashes

Hashes for vecta-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d6e19ea691861c3f36dd3f9b092544f35409a2b420a0c9758a03fc5a59354f8c
MD5 ce0496cc1933643ef56b1c8c679ff601
BLAKE2b-256 eb148f60e62c1cddfcf8f1f67b5f8558c158dc2425c59571194e184531fd5bca

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page