Skip to main content

Drop-in LLMOps observability. Self-hosted. SQLite-default. Persona-aware. Multi-judge consensus.

Project description

RAGeval

PyPI License: MIT

Drop-in LLMOps observability. Self-hosted. SQLite-default. Persona-aware. Multi-judge consensus.

The 60-Second Pitch

from rageval import track

@track(model="anthropic/claude-sonnet-4-6", persona="cfo")
async def answer_question(query: str, context_chunks: list[str]) -> str:
    ...

That's it. Open the dashboard at localhost:8003.

What It Measures

Metric Definition
Retrieval relevance Cosine sim between query and retrieved chunks (BGE-large by default)
Groundedness consensus Multi-judge LLM scoring (Claude Haiku + Groq Llama + GPT-5-mini), flags disagreement
Faithfulness Per-sentence max-similarity to any chunk (NLI proxy)
Cost USD per interaction, tracked by model
Latency End-to-end wall-clock

Comparison vs Alternatives

Feature RAGeval Phoenix Langfuse TruLens
Self-hosted
SQLite default
Drop-in decorator partial partial
Persona-aware
Multi-judge consensus
Cost tracking partial
Setup time 60 sec 10 min 15 min 10 min

Quick Start

pip install rageval
rageval init                 # creates ~/.rageval/rageval.db
rageval serve --port 8003

Integration

FastAPI

from rageval import track

@app.post("/ask")
@track(model="anthropic/claude-sonnet-4-6", persona="cfo")
async def ask(query: str):
    chunks = await retriever.search(query)
    return await llm.generate(query, chunks=chunks)

LangChain

@track(model="groq/llama-3.3-70b-versatile")
def chain_invoke(query: str, context_chunks: list[str]):
    return chain.invoke({"query": query, "context": context_chunks})

Endpoints

Method Path Purpose
GET /health Liveness
POST /eval/log Score + store
POST /eval/score Score only (no storage)
GET /eval/metrics?days=7 Aggregate dashboard data
GET /eval/queries Query log (filter by needs_review)
GET /eval/cost-report?days=30 Cost breakdown by day + model
GET /eval/alerts Recent flagged queries
POST /eval/retrieval-bench A/B compare retrieval strategies
POST /eval/embedding-comparison Compare embedding models

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

omnismart_rageval-0.1.2.tar.gz (12.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

omnismart_rageval-0.1.2-py3-none-any.whl (13.1 kB view details)

Uploaded Python 3

File details

Details for the file omnismart_rageval-0.1.2.tar.gz.

File metadata

  • Download URL: omnismart_rageval-0.1.2.tar.gz
  • Upload date:
  • Size: 12.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for omnismart_rageval-0.1.2.tar.gz
Algorithm Hash digest
SHA256 bca19e07b5fdf025b2d704a5b2cac3909a2bb07f8040deeac1ee7ca950fbfd1c
MD5 e4f7ca71e6f4f241610b58d71e401072
BLAKE2b-256 d181daa70270bf4f20408f4e5d0f6e4639be8d3438858d27b5fa28a02d2c10b2

See more details on using hashes here.

File details

Details for the file omnismart_rageval-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for omnismart_rageval-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 37a25184b6ad9df1c07c6b85c6f306198fa1f5550e2394c2c676a604ca7c21ca
MD5 42856165b7feeb39fdb356e32cc90c4f
BLAKE2b-256 741e26f3bb0ae144ba393df02ee98cbf5c84f8bd5d209a65bfb4ed5f9b404e20

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page