Skip to main content

Drop-in LLMOps observability. Self-hosted. SQLite-default. Persona-aware. Multi-judge consensus.

Project description

RAGeval

PyPI License: MIT

Drop-in LLMOps observability. Self-hosted. SQLite-default. Persona-aware. Multi-judge consensus.

The 60-Second Pitch

from rageval import track

@track(model="anthropic/claude-sonnet-4-6", persona="cfo")
async def answer_question(query: str, context_chunks: list[str]) -> str:
    ...

That's it. Open the dashboard at localhost:8003.

What It Measures

Metric Definition
Retrieval relevance Cosine sim between query and retrieved chunks (BGE-large by default)
Groundedness consensus Multi-judge LLM scoring (Claude Haiku + Groq Llama + GPT-5-mini), flags disagreement
Faithfulness Per-sentence max-similarity to any chunk (NLI proxy)
Cost USD per interaction, tracked by model
Latency End-to-end wall-clock

Comparison vs Alternatives

Feature RAGeval Phoenix Langfuse TruLens
Self-hosted
SQLite default
Drop-in decorator partial partial
Persona-aware
Multi-judge consensus
Cost tracking partial
Setup time 60 sec 10 min 15 min 10 min

Quick Start

pip install rageval
rageval init                 # creates ~/.rageval/rageval.db
rageval serve --port 8003

Integration

FastAPI

from rageval import track

@app.post("/ask")
@track(model="anthropic/claude-sonnet-4-6", persona="cfo")
async def ask(query: str):
    chunks = await retriever.search(query)
    return await llm.generate(query, chunks=chunks)

LangChain

@track(model="groq/llama-3.3-70b-versatile")
def chain_invoke(query: str, context_chunks: list[str]):
    return chain.invoke({"query": query, "context": context_chunks})

Endpoints

Method Path Purpose
GET /health Liveness
POST /eval/log Score + store
POST /eval/score Score only (no storage)
GET /eval/metrics?days=7 Aggregate dashboard data
GET /eval/queries Query log (filter by needs_review)
GET /eval/cost-report?days=30 Cost breakdown by day + model
GET /eval/alerts Recent flagged queries
POST /eval/retrieval-bench A/B compare retrieval strategies
POST /eval/embedding-comparison Compare embedding models

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

omnismart_rageval-0.1.0.tar.gz (11.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

omnismart_rageval-0.1.0-py3-none-any.whl (11.4 kB view details)

Uploaded Python 3

File details

Details for the file omnismart_rageval-0.1.0.tar.gz.

File metadata

  • Download URL: omnismart_rageval-0.1.0.tar.gz
  • Upload date:
  • Size: 11.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for omnismart_rageval-0.1.0.tar.gz
Algorithm Hash digest
SHA256 63be09a378e786a7968df4a9e85deb94a93f7258ba815c360a9be290801db602
MD5 af42dd00d34a9f3bf9c4d2633ed7e9f9
BLAKE2b-256 baafa6ff85369b8b5b8c1c1100c28911850326b05af01fb8def63509208f6f1f

See more details on using hashes here.

File details

Details for the file omnismart_rageval-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for omnismart_rageval-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 24410a795544f33357d8b8730eeabaadc7c3fb614bbf63fbbe2768715aff8337
MD5 530144ef2cd8db310930842aa7841710
BLAKE2b-256 a7c43ed04277a292160a1c679a256937810ccb634b0004e77c8abfe2a84c374a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page