Drop-in LLMOps observability. Self-hosted. SQLite-default. Persona-aware. Multi-judge consensus.
Project description
RAGeval
Drop-in LLMOps observability. Self-hosted. SQLite-default. Persona-aware. Multi-judge consensus.
The 60-Second Pitch
from rageval import track
@track(model="anthropic/claude-sonnet-4-6", persona="cfo")
async def answer_question(query: str, context_chunks: list[str]) -> str:
...
That's it. Open the dashboard at localhost:8003.
What It Measures
| Metric | Definition |
|---|---|
| Retrieval relevance | Cosine sim between query and retrieved chunks (BGE-large by default) |
| Groundedness consensus | Multi-judge LLM scoring (Claude Haiku + Groq Llama + GPT-5-mini), flags disagreement |
| Faithfulness | Per-sentence max-similarity to any chunk (NLI proxy) |
| Cost | USD per interaction, tracked by model |
| Latency | End-to-end wall-clock |
Comparison vs Alternatives
| Feature | RAGeval | Phoenix | Langfuse | TruLens |
|---|---|---|---|---|
| Self-hosted | ✅ | ✅ | ✅ | ✅ |
| SQLite default | ✅ | ❌ | ❌ | ❌ |
| Drop-in decorator | ✅ | partial | ❌ | partial |
| Persona-aware | ✅ | ❌ | ❌ | ❌ |
| Multi-judge consensus | ✅ | ❌ | ❌ | ❌ |
| Cost tracking | ✅ | ✅ | ✅ | partial |
| Setup time | 60 sec | 10 min | 15 min | 10 min |
Quick Start
pip install rageval
rageval init # creates ~/.rageval/rageval.db
rageval serve --port 8003
Integration
FastAPI
from rageval import track
@app.post("/ask")
@track(model="anthropic/claude-sonnet-4-6", persona="cfo")
async def ask(query: str):
chunks = await retriever.search(query)
return await llm.generate(query, chunks=chunks)
LangChain
@track(model="groq/llama-3.3-70b-versatile")
def chain_invoke(query: str, context_chunks: list[str]):
return chain.invoke({"query": query, "context": context_chunks})
Endpoints
| Method | Path | Purpose |
|---|---|---|
| GET | /health | Liveness |
| POST | /eval/log | Score + store |
| POST | /eval/score | Score only (no storage) |
| GET | /eval/metrics?days=7 | Aggregate dashboard data |
| GET | /eval/queries | Query log (filter by needs_review) |
| GET | /eval/cost-report?days=30 | Cost breakdown by day + model |
| GET | /eval/alerts | Recent flagged queries |
| POST | /eval/retrieval-bench | A/B compare retrieval strategies |
| POST | /eval/embedding-comparison | Compare embedding models |
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
omnismart_rageval-0.1.0.tar.gz
(11.3 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file omnismart_rageval-0.1.0.tar.gz.
File metadata
- Download URL: omnismart_rageval-0.1.0.tar.gz
- Upload date:
- Size: 11.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
63be09a378e786a7968df4a9e85deb94a93f7258ba815c360a9be290801db602
|
|
| MD5 |
af42dd00d34a9f3bf9c4d2633ed7e9f9
|
|
| BLAKE2b-256 |
baafa6ff85369b8b5b8c1c1100c28911850326b05af01fb8def63509208f6f1f
|
File details
Details for the file omnismart_rageval-0.1.0-py3-none-any.whl.
File metadata
- Download URL: omnismart_rageval-0.1.0-py3-none-any.whl
- Upload date:
- Size: 11.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
24410a795544f33357d8b8730eeabaadc7c3fb614bbf63fbbe2768715aff8337
|
|
| MD5 |
530144ef2cd8db310930842aa7841710
|
|
| BLAKE2b-256 |
a7c43ed04277a292160a1c679a256937810ccb634b0004e77c8abfe2a84c374a
|