Skip to main content

OpenTelemetry-native RAG observability SDK with semantic quality scores

Project description

RAGWatch

Quality scores in your RAG traces — computed, not just recorded.

RAGWatch is an OpenTelemetry-native Python SDK that adds semantic quality scores to your RAG traces. Unlike generic tracing tools, RAGWatch computes chunk_relevance_score inline via cosine similarity — zero LLM calls, ~1-5 ms overhead.

Installation

Using uv:

uv add ragwatch                    # Core SDK
uv add ragwatch --extra langgraph  # + LangGraph adapter
uv add ragwatch --extra crewai     # + CrewAI adapter

Quickstart

import ragwatch
from ragwatch import RAGWatchConfig, SpanKind, trace
from ragwatch.instrumentation.evaluators import chunk_relevance_score

# Configure with your OTel exporter
from opentelemetry.sdk.trace.export import ConsoleSpanExporter

ragwatch.configure(RAGWatchConfig(
    service_name="my-rag-app",
    exporter=ConsoleSpanExporter(),
))

@trace("ragwatch.embedding.generate", span_kind=SpanKind.EMBEDDING)
def embed_query(text: str) -> list[float]:
    # Your embedding API call here
    return [0.5, 0.3, 0.2]

@trace("ragwatch.retrieval.search", span_kind=SpanKind.RETRIEVER)
def retrieve_chunks(query: str) -> list[dict]:
    chunk_embeddings = [[0.5, 0.3, 0.2], [0.1, 0.9, 0.0]]
    scores = chunk_relevance_score(chunk_embeddings)
    return [{"text": "chunk", "score": s} for s in scores]

@trace("ragwatch.response.emit", span_kind=SpanKind.CHAIN)
def generate_response(chunks: list[dict]) -> str:
    return "Generated response"

# Run your pipeline
embedding = embed_query("What is RAG?")
chunks = retrieve_chunks("What is RAG?")
response = generate_response(chunks)

Development

# Install dependencies
uv sync

# Run tests
uv run pytest -v

# Run specific test
uv run pytest tests/test_tracer.py -v

How It Works

  1. Embedding stage: @trace with SpanKind.EMBEDDING stores the query embedding in OTel context
  2. Retrieval stage: chunk_relevance_score() reads the stored embedding and computes cosine similarity against each chunk
  3. Scores appear on spans: chunk.relevance_score (average) and chunk.relevance_scores (per-chunk) are set as span attributes

Framework Adapters

LangGraph

from ragwatch.adapters.langgraph import node, workflow

@node("retrieve-node")
def retrieve_node(state):
    return {**state, "docs": ["doc1"]}

@workflow("rag-pipeline")
def run_pipeline(input_data):
    return retrieve_node(input_data)

CrewAI

from ragwatch.adapters.crewai import node, endpoint

@node("researcher")
def researcher(task):
    return {"findings": "data"}

@endpoint("research-crew")
def run_crew(topic):
    return researcher(topic)

User Feedback

from ragwatch import record_feedback

record_feedback(trace_id="abc123", score=0.85)

Auto I/O Tracking

All decorators automatically capture function arguments as input.value and return values as output.value (4KB truncation). Disable per-decorator:

@trace("my-span", auto_track_io=False)
def my_func():
    ...

Use with OpenLLMetry

RAGWatch complements OpenLLMetry — use both together:

# OpenLLMetry: auto-trace LLM calls
from opentelemetry.instrumentation.openai import OpenAIInstrumentor
OpenAIInstrumentor().instrument()

# RAGWatch: add quality scores to RAG stages
import ragwatch
ragwatch.configure(RAGWatchConfig(service_name="my-app"))

API Reference

Export Description
configure(config) Initialize RAGWatch with a RAGWatchConfig
trace(span_name, span_kind, auto_track_io) Decorator for tracing functions
record_feedback(trace_id, score) Record user feedback score
chunk_relevance_score(chunk_embeddings) Compute relevance scores
RAGWatchConfig Configuration dataclass
SpanKind OpenInference span kind enum

Requirements

  • Python 3.11+
  • opentelemetry-sdk 1.24.0
  • opentelemetry-api 1.24.0

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ragwatch-0.1.4.tar.gz (25.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ragwatch-0.1.4-py3-none-any.whl (22.5 kB view details)

Uploaded Python 3

File details

Details for the file ragwatch-0.1.4.tar.gz.

File metadata

  • Download URL: ragwatch-0.1.4.tar.gz
  • Upload date:
  • Size: 25.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for ragwatch-0.1.4.tar.gz
Algorithm Hash digest
SHA256 96d2c90d18fcb7d4ecc2294a748e18d2df2e994662ec148c5230a53c229915a6
MD5 4b4fa0f70f53aa06bb2fda37b047ba6d
BLAKE2b-256 1add5052c8f029a3d7b436ecac783bb86483f811f8471ac5065306c962f6226f

See more details on using hashes here.

File details

Details for the file ragwatch-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: ragwatch-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 22.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for ragwatch-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 6bd659dc8ea2c386c11001ddc29e488d0932b833111004be75108c59e22eab7c
MD5 2aa1860fc9bfd619371aa9277271c2ea
BLAKE2b-256 fa973ce75d781d391f60746a74ce064586306c5eda64b6f3c56be2aa11e4102c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page