Drop-in LLMOps observability. Self-hosted. SQLite-default. Persona-aware. Multi-judge consensus.
Project description
RAGeval
Drop-in LLMOps observability. Self-hosted. SQLite-default. Persona-aware. Multi-judge consensus.
The 60-Second Pitch
from rageval import track
@track(model="anthropic/claude-sonnet-4-6", persona="cfo")
async def answer_question(query: str, context_chunks: list[str]) -> str:
...
That's it. Open the dashboard at localhost:8003.
What It Measures
| Metric | Definition |
|---|---|
| Retrieval relevance | Cosine sim between query and retrieved chunks (BGE-large by default) |
| Groundedness consensus | Multi-judge LLM scoring (Claude Haiku + Groq Llama + GPT-5-mini), flags disagreement |
| Faithfulness | Per-sentence max-similarity to any chunk (NLI proxy) |
| Cost | USD per interaction, tracked by model |
| Latency | End-to-end wall-clock |
Comparison vs Alternatives
| Feature | RAGeval | Phoenix | Langfuse | TruLens |
|---|---|---|---|---|
| Self-hosted | ✅ | ✅ | ✅ | ✅ |
| SQLite default | ✅ | ❌ | ❌ | ❌ |
| Drop-in decorator | ✅ | partial | ❌ | partial |
| Persona-aware | ✅ | ❌ | ❌ | ❌ |
| Multi-judge consensus | ✅ | ❌ | ❌ | ❌ |
| Cost tracking | ✅ | ✅ | ✅ | partial |
| Setup time | 60 sec | 10 min | 15 min | 10 min |
Quick Start
pip install rageval
rageval init # creates ~/.rageval/rageval.db
rageval serve --port 8003
Integration
FastAPI
from rageval import track
@app.post("/ask")
@track(model="anthropic/claude-sonnet-4-6", persona="cfo")
async def ask(query: str):
chunks = await retriever.search(query)
return await llm.generate(query, chunks=chunks)
LangChain
@track(model="groq/llama-3.3-70b-versatile")
def chain_invoke(query: str, context_chunks: list[str]):
return chain.invoke({"query": query, "context": context_chunks})
Endpoints
| Method | Path | Purpose |
|---|---|---|
| GET | /health | Liveness |
| POST | /eval/log | Score + store |
| POST | /eval/score | Score only (no storage) |
| GET | /eval/metrics?days=7 | Aggregate dashboard data |
| GET | /eval/queries | Query log (filter by needs_review) |
| GET | /eval/cost-report?days=30 | Cost breakdown by day + model |
| GET | /eval/alerts | Recent flagged queries |
| POST | /eval/retrieval-bench | A/B compare retrieval strategies |
| POST | /eval/embedding-comparison | Compare embedding models |
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
omnismart_rageval-0.1.2.tar.gz
(12.8 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file omnismart_rageval-0.1.2.tar.gz.
File metadata
- Download URL: omnismart_rageval-0.1.2.tar.gz
- Upload date:
- Size: 12.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bca19e07b5fdf025b2d704a5b2cac3909a2bb07f8040deeac1ee7ca950fbfd1c
|
|
| MD5 |
e4f7ca71e6f4f241610b58d71e401072
|
|
| BLAKE2b-256 |
d181daa70270bf4f20408f4e5d0f6e4639be8d3438858d27b5fa28a02d2c10b2
|
File details
Details for the file omnismart_rageval-0.1.2-py3-none-any.whl.
File metadata
- Download URL: omnismart_rageval-0.1.2-py3-none-any.whl
- Upload date:
- Size: 13.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
37a25184b6ad9df1c07c6b85c6f306198fa1f5550e2394c2c676a604ca7c21ca
|
|
| MD5 |
42856165b7feeb39fdb356e32cc90c4f
|
|
| BLAKE2b-256 |
741e26f3bb0ae144ba393df02ee98cbf5c84f8bd5d209a65bfb4ed5f9b404e20
|