Skip to main content

OpenTelemetry-native RAG observability SDK with semantic quality scores

Project description

RAGWatch

Quality scores in your RAG traces — computed, not just recorded.

RAGWatch is an OpenTelemetry-native Python SDK that adds semantic quality scores to your RAG traces. Unlike generic tracing tools, RAGWatch computes chunk_relevance_score inline via cosine similarity — zero LLM calls, ~1-5 ms overhead.

Installation

Using uv:

uv add ragwatch                    # Core SDK
uv add ragwatch --extra langgraph  # + LangGraph adapter
uv add ragwatch --extra crewai     # + CrewAI adapter

Quickstart

import ragwatch
from ragwatch import RAGWatchConfig, SpanKind, trace
from ragwatch.instrumentation.evaluators import chunk_relevance_score

# Configure with your OTel exporter
from opentelemetry.sdk.trace.export import ConsoleSpanExporter

ragwatch.configure(RAGWatchConfig(
    service_name="my-rag-app",
    exporter=ConsoleSpanExporter(),
))

@trace("ragwatch.embedding.generate", span_kind=SpanKind.EMBEDDING)
def embed_query(text: str) -> list[float]:
    # Your embedding API call here
    return [0.5, 0.3, 0.2]

@trace("ragwatch.retrieval.search", span_kind=SpanKind.RETRIEVER)
def retrieve_chunks(query: str) -> list[dict]:
    chunk_embeddings = [[0.5, 0.3, 0.2], [0.1, 0.9, 0.0]]
    scores = chunk_relevance_score(chunk_embeddings)
    return [{"text": "chunk", "score": s} for s in scores]

@trace("ragwatch.response.emit", span_kind=SpanKind.CHAIN)
def generate_response(chunks: list[dict]) -> str:
    return "Generated response"

# Run your pipeline
embedding = embed_query("What is RAG?")
chunks = retrieve_chunks("What is RAG?")
response = generate_response(chunks)

Development

# Install dependencies
uv sync

# Run tests
uv run pytest -v

# Run specific test
uv run pytest tests/test_tracer.py -v

How It Works

  1. Embedding stage: @trace with SpanKind.EMBEDDING stores the query embedding in OTel context
  2. Retrieval stage: chunk_relevance_score() reads the stored embedding and computes cosine similarity against each chunk
  3. Scores appear on spans: chunk.relevance_score (average) and chunk.relevance_scores (per-chunk) are set as span attributes

Framework Adapters

LangGraph

from ragwatch.adapters.langgraph import node, workflow

@node("retrieve-node")
def retrieve_node(state):
    return {**state, "docs": ["doc1"]}

@workflow("rag-pipeline")
def run_pipeline(input_data):
    return retrieve_node(input_data)

CrewAI

from ragwatch.adapters.crewai import node, endpoint

@node("researcher")
def researcher(task):
    return {"findings": "data"}

@endpoint("research-crew")
def run_crew(topic):
    return researcher(topic)

User Feedback

from ragwatch import record_feedback

record_feedback(trace_id="abc123", score=0.85)

Auto I/O Tracking

All decorators automatically capture function arguments as input.value and return values as output.value (4KB truncation). Disable per-decorator:

@trace("my-span", auto_track_io=False)
def my_func():
    ...

Use with OpenLLMetry

RAGWatch complements OpenLLMetry — use both together:

# OpenLLMetry: auto-trace LLM calls
from opentelemetry.instrumentation.openai import OpenAIInstrumentor
OpenAIInstrumentor().instrument()

# RAGWatch: add quality scores to RAG stages
import ragwatch
ragwatch.configure(RAGWatchConfig(service_name="my-app"))

API Reference

Export Description
configure(config) Initialize RAGWatch with a RAGWatchConfig
trace(span_name, span_kind, auto_track_io) Decorator for tracing functions
record_feedback(trace_id, score) Record user feedback score
chunk_relevance_score(chunk_embeddings) Compute relevance scores
RAGWatchConfig Configuration dataclass
SpanKind OpenInference span kind enum

Requirements

  • Python 3.11+
  • opentelemetry-sdk 1.24.0
  • opentelemetry-api 1.24.0

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ragwatch-0.1.2.tar.gz (20.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ragwatch-0.1.2-py3-none-any.whl (16.3 kB view details)

Uploaded Python 3

File details

Details for the file ragwatch-0.1.2.tar.gz.

File metadata

  • Download URL: ragwatch-0.1.2.tar.gz
  • Upload date:
  • Size: 20.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for ragwatch-0.1.2.tar.gz
Algorithm Hash digest
SHA256 fb21db635a979f218202bc1e64cff0de44885ef6b0a128cba75a28fdfd567a0e
MD5 2fada4754162f8652f395b0502e1844b
BLAKE2b-256 7aa78d3790feaa5291873616e4ffff92b3ee04a72d4c2dbd962d0c0372ff65a0

See more details on using hashes here.

File details

Details for the file ragwatch-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: ragwatch-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 16.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for ragwatch-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 b5aac5287e7ced1a12aa7c5e81d649f1fbee8835437c0b02e341a8145c36b4bc
MD5 d59b7c258bddcc1d768073f86dda9e08
BLAKE2b-256 7664700fde9c3e3330d7df7faacd017ec81fa6d346d4be7aa693a2b3764648ef

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page