Drop-in LLMOps observability. Self-hosted. SQLite-default. Persona-aware. Multi-judge consensus.
Project description
RAGeval
Drop-in LLMOps observability. Self-hosted. SQLite-default. Persona-aware. Multi-judge consensus.
The 60-Second Pitch
from rageval import track
@track(model="anthropic/claude-sonnet-4-6", persona="cfo")
async def answer_question(query: str, context_chunks: list[str]) -> str:
...
That's it. Open the dashboard at localhost:8003.
What It Measures
| Metric | Definition |
|---|---|
| Retrieval relevance | Cosine sim between query and retrieved chunks (BGE-large by default) |
| Groundedness consensus | Multi-judge LLM scoring (Claude Haiku + Groq Llama + GPT-5-mini), flags disagreement |
| Faithfulness | Per-sentence max-similarity to any chunk (NLI proxy) |
| Cost | USD per interaction, tracked by model |
| Latency | End-to-end wall-clock |
Comparison vs Alternatives
| Feature | RAGeval | Phoenix | Langfuse | TruLens |
|---|---|---|---|---|
| Self-hosted | ✅ | ✅ | ✅ | ✅ |
| SQLite default | ✅ | ❌ | ❌ | ❌ |
| Drop-in decorator | ✅ | partial | ❌ | partial |
| Persona-aware | ✅ | ❌ | ❌ | ❌ |
| Multi-judge consensus | ✅ | ❌ | ❌ | ❌ |
| Cost tracking | ✅ | ✅ | ✅ | partial |
| Setup time | 60 sec | 10 min | 15 min | 10 min |
Quick Start
pip install rageval
rageval init # creates ~/.rageval/rageval.db
rageval serve --port 8003
Integration
FastAPI
from rageval import track
@app.post("/ask")
@track(model="anthropic/claude-sonnet-4-6", persona="cfo")
async def ask(query: str):
chunks = await retriever.search(query)
return await llm.generate(query, chunks=chunks)
LangChain
@track(model="groq/llama-3.3-70b-versatile")
def chain_invoke(query: str, context_chunks: list[str]):
return chain.invoke({"query": query, "context": context_chunks})
Endpoints
| Method | Path | Purpose |
|---|---|---|
| GET | /health | Liveness |
| POST | /eval/log | Score + store |
| POST | /eval/score | Score only (no storage) |
| GET | /eval/metrics?days=7 | Aggregate dashboard data |
| GET | /eval/queries | Query log (filter by needs_review) |
| GET | /eval/cost-report?days=30 | Cost breakdown by day + model |
| GET | /eval/alerts | Recent flagged queries |
| POST | /eval/retrieval-bench | A/B compare retrieval strategies |
| POST | /eval/embedding-comparison | Compare embedding models |
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
omnismart_rageval-0.1.1.tar.gz
(12.7 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file omnismart_rageval-0.1.1.tar.gz.
File metadata
- Download URL: omnismart_rageval-0.1.1.tar.gz
- Upload date:
- Size: 12.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1f086eaacfebd7df857864a85ef11c01155419058826c2e45afda530385b13bf
|
|
| MD5 |
7ac484f25168f34862607afe73b59719
|
|
| BLAKE2b-256 |
a3d452d8404139d35c84d219d27ad9db21bbcd7149ed69133e85c9207cc007a4
|
File details
Details for the file omnismart_rageval-0.1.1-py3-none-any.whl.
File metadata
- Download URL: omnismart_rageval-0.1.1-py3-none-any.whl
- Upload date:
- Size: 13.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
351ad045687fc2c9d2b74c638cb762e1ef173c2aea0e8d71d0b7b784c812eeb4
|
|
| MD5 |
b40d669dab9e4e2301810ade0511388f
|
|
| BLAKE2b-256 |
14f157f3aae824dbda28a6ff339944cae88cc2baa62e50564adc1efbd445a328
|