Small Language Models Evaluation Suite for RAG Systems

These details have not been verified by PyPI

Project links

Project description

smallevals - Small Language Models Evaluation Suite for RAG Systems

A lightweight evaluation framework powered by tiny ( really tiny ) 0.6B models — runs 100% locally on CPU/GPU/MPS, extremely fast and cheap.

Evaluation tools requiring LLM-as-a-judge, that costs/doesn't scale easily. evaluates in seconds in GPU, in minutes in any CPU !

Evaluate Retrieval

Evaluation of RAG system includes retrieval and RAG stage, attacks to test retrieval and RAG answers(in the near future)!

Models

Model Name	Task	Status	Link
QAG-0.6B	Generate golden Q/A from chunks (synthetic evaluation data)	Available	🤗
CRC-0.6B	Context relevance classifier (question ↔ retrieved chunk)	Incoming	—
GJ-0.6B	Groundedness / faithfulness judge (answer ↔ context)	Incoming	—
ASM-0.6B	Answer correctness / semantic similarity	Incoming	—

Current Focus: Retrieval evaluation (QAG-0.5B). Generation evaluation models (CRC-0.5B, GJ-0.5B, ASM-0.5B) are future work.

Installation

pip install smallevals

Quick Start

Evaluate Retrieval Quality (Python)

Connect to your favourite Vector DB (Milvus, Elastic, PGVector, Chroma, Pinecone, FAISS, Weawiate), attach your favourite embeddings, generate questions, and visualise results!

Under the hood, generates question per chunk, and tries to retrieve it as a single-first relevant docs, calculate scores.

from smallevals import evaluate_retrievals, SmallEvalsVDBConnection

vdb = SmallEvalsVDBConnection(
    connection=chroma_client,
    collection="my_collection",
    embedding=embedding
)

# Run evaluation
result = evaluate_retrievals(connection=vdb, top_k=10, n_chunks=200) # Generate question for 200 chunks, and test to retrieve them!

And evaluate results!

Generate QA from Documents (CLI)

smallevals --docs-dir ./documents --num-questions 100

### QAG-0.6B

The model was trained on TriviaQA, SQuAD 2.0, Hand-curated synthetic data generated using Qwen-70B , generating a question from the chunk/doc.

Given the passage below, extract ONE question/answer pair grounded strictly in a single atomic fact.

PASSAGE:
"Eiffel tower is built at 1989"

Return ONLY a JSON object.

{
  "question": "When was the Eiffel Tower completed?",
  "answer": "1889"
}

Known issues:

Model is trained on text/wiki data, bias towards well structured text.
Dataset contains question that ask generic questions, dataset will be more carefully crafted in v3.

### Other Models:

Other models to be trained to eliminate the need of external LLMs.

CRC-0.6B : Context relevance classifier (question ↔ retrieved chunk) GJ-0.6B : Groundedness / faithfulness judge (answer ↔ context)
ASM-0.6B | Answer correctness / semantic similarity

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.8

Dec 4, 2025

0.1.7

Dec 4, 2025

0.1.5

Dec 4, 2025

0.1.4

Dec 4, 2025

0.1.3

Dec 4, 2025

0.1.2

Dec 4, 2025

0.1.1

Dec 4, 2025

This version

0.1.0

Dec 4, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smallevals-0.1.0.tar.gz (99.8 kB view details)

Uploaded Dec 4, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

smallevals-0.1.0-py3-none-any.whl (104.7 kB view details)

Uploaded Dec 4, 2025 Python 3

File details

Details for the file smallevals-0.1.0.tar.gz.

File metadata

Download URL: smallevals-0.1.0.tar.gz
Upload date: Dec 4, 2025
Size: 99.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for smallevals-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`114e08b26f3cb6634249b3227d118381031492fcf6ca0641d6a95a72dc56388f`
MD5	`25e50ee33f017afc491a8d14038a6379`
BLAKE2b-256	`942a7c2326387c2080ac1ce3325bc0340ede2ec23322b259a986c38586bb4851`

See more details on using hashes here.

File details

Details for the file smallevals-0.1.0-py3-none-any.whl.

File metadata

Download URL: smallevals-0.1.0-py3-none-any.whl
Upload date: Dec 4, 2025
Size: 104.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for smallevals-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`aa6c436355ae1938c7a9a2b54822f13db4b22f9df164f9473edf83f485725e01`
MD5	`448fa6de32dc31b8ce1035cf80cb3889`
BLAKE2b-256	`b2c0ea3fe4982d902c0489f7010522fe24ae3c9dec2b433eb8c31d7951ba2899`

See more details on using hashes here.

smallevals 0.1.0

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

smallevals - Small Language Models Evaluation Suite for RAG Systems

Evaluate Retrieval

Models

Installation

Quick Start

Evaluate Retrieval Quality (Python)

Generate QA from Documents (CLI)

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes