Skip to main content

Small Language Models Evaluation Suite for RAG Systems

Project description

smallevals logo - Small Language Models Evaluation Suite for RAG Systems

A lightweight evaluation framework powered by tiny ( really tiny logo ) 0.6B models — runs 100% locally on CPU/GPU/MPS, extremely fast and cheap.

Evaluation tools requiring LLM-as-a-judge, that costs/doesn't scale easily. logo evaluates in seconds in GPU, in minutes in any CPU logo logo!

Evaluate Retrieval

Evaluation of RAG system includes retrieval and RAG stage, logo attacks to test retrieval and RAG answers(in the near future)!

Models

Model Name Task Status Link
QAG-0.6B Generate golden Q/A from chunks (synthetic evaluation data) Available 🤗
CRC-0.6B Context relevance classifier (question ↔ retrieved chunk) Incoming
GJ-0.6B Groundedness / faithfulness judge (answer ↔ context) Incoming
ASM-0.6B Answer correctness / semantic similarity Incoming

Current Focus: Retrieval evaluation (QAG-0.5B). Generation evaluation models (CRC-0.5B, GJ-0.5B, ASM-0.5B) are future work.

Installation

pip install smallevals

Quick Start

Evaluate Retrieval Quality (Python)

Connect to your favourite Vector DB (Milvus, Elastic, PGVector, Chroma, Pinecone, FAISS, Weawiate), attach your favourite embeddings, generate questions, and visualise results!

Under the hood, logo generates question per chunk, and tries to retrieve it as a single-first relevant docs, calculate scores.

from smallevals import evaluate_retrievals, SmallEvalsVDBConnection

vdb = SmallEvalsVDBConnection(
    connection=chroma_client,
    collection="my_collection",
    embedding=embedding
)

# Run evaluation
result = evaluate_retrievals(connection=vdb, top_k=10, n_chunks=200) # Generate question for 200 chunks, and test to retrieve them!

And evaluate results!

Generate QA from Documents (CLI)

smallevals --docs-dir ./documents --num-questions 100

### QAG-0.6B

The model was trained on TriviaQA, SQuAD 2.0, Hand-curated synthetic data generated using Qwen-70B , generating a question from the chunk/doc.

Given the passage below, extract ONE question/answer pair grounded strictly in a single atomic fact.

PASSAGE:
"Eiffel tower is built at 1989"

Return ONLY a JSON object.
{
  "question": "When was the Eiffel Tower completed?",
  "answer": "1889"
}

Known issues:

  • Model is trained on text/wiki data, bias towards well structured text.
  • Dataset contains question that ask generic questions, dataset will be more carefully crafted in v3.

### Other Models:

Other models to be trained to eliminate the need of external LLMs.

CRC-0.6B : Context relevance classifier (question ↔ retrieved chunk) GJ-0.6B : Groundedness / faithfulness judge (answer ↔ context)
ASM-0.6B | Answer correctness / semantic similarity

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smallevals-0.1.7.tar.gz (100.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

smallevals-0.1.7-py3-none-any.whl (105.0 kB view details)

Uploaded Python 3

File details

Details for the file smallevals-0.1.7.tar.gz.

File metadata

  • Download URL: smallevals-0.1.7.tar.gz
  • Upload date:
  • Size: 100.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for smallevals-0.1.7.tar.gz
Algorithm Hash digest
SHA256 08f65d5a0beebd64fab0d372d33c7a8d2ac4e280734bf190e28efda8ffb1546b
MD5 f00d9e1b7d3210e0585fecaa92f2e03b
BLAKE2b-256 faa4745f36f3029ce3de83a4fd76cfdafedd53f4140cc173b5ea89a6c49ca99a

See more details on using hashes here.

File details

Details for the file smallevals-0.1.7-py3-none-any.whl.

File metadata

  • Download URL: smallevals-0.1.7-py3-none-any.whl
  • Upload date:
  • Size: 105.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for smallevals-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 4d9e5c707042e10a0f1298e2d6cd9ec23e7ed50c3c6ec7677ff4ef9c68bb166c
MD5 e1cf4a95d31f4a32f66a2aac315a2148
BLAKE2b-256 a3acb124801b0b2036459e64cc3b0ead540f27b059b688e83339f789ffa2af56

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page