Skip to main content

Observability and tracing for Retrieval-Augmented Generation pipelines

Project description

RAGTrace ๐Ÿ“Š

Observability for RAG pipelines - Trace, inspect, optimize, and regression-test Retrieval-Augmented Generation systems with ease.

Python 3.11+ License: MIT Code style: black

โœจ What is RAGTrace?

RAGTrace is a lightweight observability layer for RAG (Retrieval-Augmented Generation) systems that captures and visualizes every step of your pipeline:

  • ๐Ÿ” Event Capture - Automatically intercepts retrieval, prompt, and generation events
  • ๐Ÿ’ฐ Cost Tracking - Accurate token counting and cost estimation per query (GPT-4o, Claude, Gemini, o1/o3 and more)
  • ๐Ÿ“Š Interactive Web UI - Modern timeline view with charts, filters, and event inspection
  • ๐Ÿ”ง CLI Tool - Developer-friendly command-line interface
  • ๐ŸŒ REST API - Query and analyze sessions programmatically
  • ๐Ÿงช Regression Testing - Snapshot and compare RAG outputs with scoring
  • ๐Ÿ“ Prompt Versioning - Track and diff prompt template changes over time
  • ๐Ÿฆ™ LlamaIndex Support - First-class LlamaIndex callback integration

Think of it as OpenTelemetry, but specifically for RAG pipelines.

๐Ÿš€ Quick Start

Installation

# Clone the repository
git clone https://github.com/yourusername/ragtrace.git
cd ragtrace

# Install dependencies
pip install -e .

# Initialize database
ragtrace init

Basic Usage (LangChain)

from langchain.vectorstores import FAISS
from langchain.embeddings import OpenAIEmbeddings
from langchain.chat_models import ChatOpenAI
from langchain.chains import RetrievalQA
from ragtrace import RagTracer

# Your existing RAG setup
embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_texts(["Your documents here..."], embeddings)
llm = ChatOpenAI(model="gpt-4o-mini")

# Add RAGTrace - just one line!
tracer = RagTracer(auto_save=True)

chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=vectorstore.as_retriever(),
    callbacks=[tracer]  # โ† Automatic capture!
)

result = chain.run("What is RAG?")

Basic Usage (LlamaIndex)

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from ragtrace.llamaindex import SimpleRagTracerLlamaIndex

# Build your index
documents = SimpleDirectoryReader("data/").load_data()
index = VectorStoreIndex.from_documents(documents)

# Use as a context manager โ€“ automatically saves on exit
with SimpleRagTracerLlamaIndex() as tracer:
    query_engine = index.as_query_engine(
        callbacks=[tracer]
    )
    response = query_engine.query("What is RAG?")
    session_id = tracer.session_id

print(f"Session saved: {session_id}")

View Results

# View latest session in CLI
ragtrace show last

# List all sessions
ragtrace list

# Export to JSON
ragtrace export <session-id> > session.json

# Start API + Web UI
ragtrace run                  # API on :8000
python ui/serve.py            # UI on :3000

๐ŸŒ Web UI

RAGTrace includes a modern web interface for visualizing and analyzing your RAG pipelines:

# Terminal 1: Start API server
uvicorn api.main:app --host 0.0.0.0 --port 8000 --reload

# Terminal 2: Start UI server
python ui/serve.py

Then open http://localhost:3000 in your browser.

Web UI Features

  • ๐Ÿ“‹ Sessions View - Browse all captured RAG sessions with search and filtering
  • ๐Ÿ“Š Timeline View - Interactive timeline showing retrieval โ†’ prompt โ†’ generation flow
  • ๐Ÿ“ˆ Performance Charts - Waterfall chart for event durations, cost breakdown by component
  • ๐Ÿ” Event Inspector - Click any event to see full details including tokens, costs, and data
  • ๐Ÿ“ธ Regression Tab - Create snapshots and run side-by-side regression comparisons
  • ๐Ÿ“ Prompts Tab - Register prompt templates, browse versions, and view inline diffs
  • ๐Ÿ“ค Export Tools - Export session data as JSON or CSV, copy to clipboard

๐Ÿ“Š Example Output

โ•ญโ”€ Session: d4f3a8b2-... โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ Query: What is RAG?                                      โ”‚
โ”‚ Model: gpt-4o-mini                                       โ”‚
โ”‚ Cost: $0.00012                                           โ”‚
โ”‚ Duration: 1,250ms                                        โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”“
โ”ƒ Event       โ”ƒ Duration   โ”ƒ Cost       โ”ƒ
โ”กโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ฉ
โ”‚ Retrieval   โ”‚ 150ms      โ”‚ $0.00001   โ”‚
โ”‚ Prompt      โ”‚ 0ms        โ”‚ $0.00000   โ”‚
โ”‚ Generation  โ”‚ 1,100ms    โ”‚ $0.00011   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐ŸŽฏ Features

โœ… Core Features

  • Automatic Event Capture - Works with LangChain and LlamaIndex callbacks
  • Cost Tracking - tiktoken-based accurate token counting (2025/2026 pricing)
  • Timeline Visualization - See your RAG pipeline in action
  • Session Management - Store and retrieve debugging sessions
  • CLI Tool - Rich formatted terminal output
  • REST API - FastAPI server with OpenAPI docs
  • Web UI - Interactive timeline with charts and event inspection
  • JSON/CSV Export - Export sessions for analysis
  • Regression Testing - Save snapshots and score retrieval/answer regressions
  • Prompt Versioning - Version control for prompt templates with diff view

๐ŸŽจ CLI Commands

ragtrace init                              # Initialize database
ragtrace list                              # List recent sessions
ragtrace show [id]                         # View session details
ragtrace show last                         # View latest session
ragtrace export <id>                       # Export session to JSON
ragtrace clear                             # Clear all data
ragtrace run                               # Start API server

# Snapshot & regression
ragtrace snapshot save <name>              # Create a named snapshot
ragtrace snapshot list                     # List all snapshots
ragtrace snapshot compare <id1> <id2>      # Compare snapshots (rich report)
ragtrace snapshot compare <id1> <id2> --json  # Machine-readable output

# Prompt versioning
ragtrace prompt save <name> <template.txt> # Save a prompt version
ragtrace prompt list                       # List all prompt names
ragtrace prompt list <name>                # List versions for a prompt
ragtrace prompt show <name>                # Show active template
ragtrace prompt show <name> -v 2           # Show specific version
ragtrace prompt diff <name> 1 2            # Colored unified diff

๐ŸŒ API Endpoints

# Sessions
POST   /api/sessions                          Create session
GET    /api/sessions                          List sessions
GET    /api/sessions/{id}                     Get session
PATCH  /api/sessions/{id}                     Update session
DELETE /api/sessions/{id}                     Delete session
POST   /api/sessions/{id}/events              Add event
GET    /api/sessions/{id}/cost                Get session cost

# Snapshots & regression
POST   /api/snapshots                         Create snapshot
GET    /api/snapshots                         List snapshots
GET    /api/snapshots/{id}                    Get snapshot
DELETE /api/snapshots/{id}                    Delete snapshot
GET    /api/snapshots/{id1}/compare/{id2}     Full comparison result
GET    /api/snapshots/{id1}/score/{id2}       Regression score (PASS/WARN/FAIL)

# Prompt versioning
GET    /api/prompts                           List prompt names
POST   /api/prompts                           Save new prompt version
GET    /api/prompts/{name}                    List versions for a prompt
GET    /api/prompts/{name}/active             Get active version
GET    /api/prompts/{name}/versions/{v}       Get specific version
GET    /api/prompts/{name}/diff/{va}/{vb}     Diff two versions
DELETE /api/prompts/{name}/versions/{v}       Delete a version

# Stats & costs
GET    /api/stats                             Aggregate stats
GET    /api/cost/breakdown                    Cost breakdown

Visit http://localhost:8000/docs after running ragtrace run for interactive API documentation.

๐Ÿ—๏ธ Architecture

ragtrace/
โ”œโ”€โ”€ core/              # Core business logic
โ”‚   โ”œโ”€โ”€ models.py      # Pydantic v2 data models (Session, Snapshot, PromptVersionโ€ฆ)
โ”‚   โ”œโ”€โ”€ storage.py     # SQLite database layer (sessions, snapshots, prompt_versions)
โ”‚   โ”œโ”€โ”€ cost.py        # Token counting & 2025/2026 cost table
โ”‚   โ”œโ”€โ”€ capture.py     # Event aggregation & unused-chunk detection
โ”‚   โ””โ”€โ”€ regression.py  # Snapshot diff engine (Jaccard + SequenceMatcher scoring)
โ”œโ”€โ”€ langchain/         # LangChain integration
โ”‚   โ””โ”€โ”€ middleware.py  # BaseCallbackHandler implementation
โ”œโ”€โ”€ llamaindex/        # LlamaIndex integration
โ”‚   โ””โ”€โ”€ middleware.py  # RagTracerLlamaIndex + SimpleRagTracerLlamaIndex
โ”œโ”€โ”€ api/               # REST API (FastAPI)
โ”‚   โ”œโ”€โ”€ main.py        # FastAPI application
โ”‚   โ””โ”€โ”€ routes.py      # All API endpoints (30+)
โ”œโ”€โ”€ ui/                # Web UI (HTML/CSS/Vanilla JS)
โ”‚   โ”œโ”€โ”€ index.html     # Main UI page
โ”‚   โ”œโ”€โ”€ app.js         # Frontend application (sessions, timeline, regression, prompts)
โ”‚   โ”œโ”€โ”€ styles.css     # UI styling
โ”‚   โ””โ”€โ”€ serve.py       # Development server
โ”œโ”€โ”€ cli/               # Command-line interface
โ”‚   โ””โ”€โ”€ main.py        # Click CLI (snapshot compare, prompt group)
โ”œโ”€โ”€ examples/          # Usage examples
โ”‚   โ””โ”€โ”€ simple_rag.py  # Complete working example
โ””โ”€โ”€ tests/             # Test suite (150+ tests)
    โ”œโ”€โ”€ test_cost.py
    โ”œโ”€โ”€ test_storage.py
    โ”œโ”€โ”€ test_capture.py
    โ”œโ”€โ”€ test_regression.py
    โ”œโ”€โ”€ test_prompt_versioning.py
    โ””โ”€โ”€ test_llamaindex.py

๐Ÿงช Regression Testing

RAGTrace makes it easy to catch regressions between RAG pipeline versions:

# 1. Save a baseline snapshot after a good run
ragtrace snapshot save "v1-baseline"

# 2. Make pipeline changes, run again, then snapshot
ragtrace snapshot save "v2-candidate"

# 3. Compare โ€“ get a PASS/WARN/FAIL verdict
ragtrace snapshot compare <baseline-id> <candidate-id>

Scoring uses a weighted composite:

  • Retrieval similarity (40 %) โ€” Jaccard set overlap of retrieved chunk texts
  • Answer similarity (60 %) โ€” difflib SequenceMatcher ratio

Verdicts: PASS (โ‰ฅ 0.8), WARN (โ‰ฅ 0.6), FAIL (< 0.6).

Or via the API:

GET /api/snapshots/{id1}/score/{id2}
โ†’ {"verdict": "PASS", "score": 0.92, "retrieval_similarity": 0.95, "answer_similarity": 0.90}

๐Ÿ“ Prompt Versioning

Track every change to your prompt templates:

# Save v1
echo "Answer using: {context}\nQuestion: {question}" > prompt.txt
ragtrace prompt save qa_prompt prompt.txt -d "Initial version"

# Save v2 after tweaks
echo "You are a helpful assistant.\nContext: {context}\nQ: {question}\nA:" > prompt.txt
ragtrace prompt save qa_prompt prompt.txt -d "Added system message"

# See what changed
ragtrace prompt diff qa_prompt 1 2

Or via the API:

import requests

# Save a new version
requests.post("http://localhost:8000/api/prompts", json={
    "name": "qa_prompt",
    "template": "Context: {context}\nQuestion: {question}",
    "description": "v2 - cleaner format"
})

# Compare versions
diff = requests.get("http://localhost:8000/api/prompts/qa_prompt/diff/1/2").json()
print(diff["similarity_score"])  # e.g. 0.72

๐Ÿ“‹ Requirements

  • Python: 3.11+
  • Core deps: FastAPI, uvicorn, tiktoken, Rich, Click, pydantic v2
  • Optional: langchain (LangChain integration), llama-index (LlamaIndex integration)

๐Ÿ› ๏ธ Development

# Install with dev dependencies
pip install -e ".[dev]"

# Run all tests
pytest

# With coverage
pytest --cov=core --cov=langchain --cov=llamaindex --cov=api --cov=cli

# Format + lint
black .
ruff check .

๐Ÿ”ฌ Use Cases

1. Debug Failed Queries

ragtrace show last

2. Track Costs

ragtrace list --sort-by cost

3. Regression Testing After Model Upgrade

ragtrace snapshot save "gpt-4o-mini-baseline"
# upgrade model, re-run queries
ragtrace snapshot save "gpt-4o-baseline"
ragtrace snapshot compare <old-id> <new-id>

4. Prompt A/B Testing

ragtrace prompt diff qa_prompt 1 2

๐Ÿค Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

๐Ÿ“ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ™ Acknowledgments

๐Ÿ—บ๏ธ Roadmap

v0.2.0 (Complete โœ…)

  • Web UI for timeline visualization
  • Interactive event inspection
  • Performance charts (waterfall, cost breakdown)
  • Export tools (JSON, CSV)
  • Snapshot regression testing (PASS/WARN/FAIL scoring)
  • LlamaIndex integration
  • Prompt versioning (create, diff, CLI, API, Web UI)
  • Updated 2025/2026 pricing (GPT-4o, o1, o3-mini baseline)

v0.3.0 (Planned)

  • Agent tracing support
  • Cost optimization suggestions
  • Quality scoring with LLM-as-judge
  • Team collaboration features

v1.0.0

  • Cloud mode
  • Advanced analytics
  • Alert system
  • Multi-framework support

Built with โค๏ธ for RAG developers

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ragtrace-0.2.1.tar.gz (68.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ragtrace-0.2.1-py3-none-any.whl (74.6 kB view details)

Uploaded Python 3

File details

Details for the file ragtrace-0.2.1.tar.gz.

File metadata

  • Download URL: ragtrace-0.2.1.tar.gz
  • Upload date:
  • Size: 68.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for ragtrace-0.2.1.tar.gz
Algorithm Hash digest
SHA256 1e25570fe3fc471408f6f53bbadf27654a0890c49ff223ebfe836cbc897931c7
MD5 884b65ae94a019fa7159c7eeeb85f56d
BLAKE2b-256 32eb28fe34ce6784a907d12ebdad547175a7a29610d0806582748ca7e588438e

See more details on using hashes here.

File details

Details for the file ragtrace-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: ragtrace-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 74.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for ragtrace-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 27eee750031fa2294f251ba7177a4773e5e51fefecc111cbe8b22e30b147b6e4
MD5 15cd2e8ddc8944a0856cde0f4228b929
BLAKE2b-256 d065c9f68024971d8fb4731a140b292fdbda06414519f39382e6323f568d38b5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page