ragnarok-ai

Local-first evaluation framework for RAG pipelines and AI agents

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

2501project

These details have not been verified by PyPI

Project description

ragnarok-ai logo

Local-first RAG evaluation framework for LLM applications

Evaluate, benchmark, and monitor your RAG pipelines — 100% locally, no API keys required.

Problem • Solution • Quick Start • Installation • Roadmap

The Problem

Building RAG systems is easy. Knowing if they actually work is hard.

Current evaluation tools are either:

Tool	Issue
Giskard	Heavy, slow (45-60 min scans), loses progress on crash, enterprise-focused
RAGAS	Requires OpenAI API keys, no local-first option
Manual testing	Doesn't scale, not reproducible

You need a tool that:

Runs 100% locally (Ollama, local models)
Evaluates fast with checkpointing (no lost progress)
Integrates with your existing stack (LangChain, LangGraph)
Fits in CI/CD pipelines
Doesn't require a PhD to use

The Solution

ragnarok-ai is a lightweight, local-first framework to evaluate RAG pipelines.

RAGnarok-AI Overview

from ragnarok_ai import evaluate, generate_testset

# Generate test questions from your knowledge base
testset = await generate_testset(
    knowledge_base="./docs/",
    num_questions=50,
    types=["simple", "multi_hop", "adversarial"],
    llm="ollama/mistral",
    checkpoint=True,  # Resume if interrupted
)

# Evaluate your RAG pipeline
results = await evaluate(
    rag_pipeline=my_rag,
    testset=testset,
    metrics=["retrieval", "faithfulness", "relevance"],
    llm="ollama/mistral",
)

# Get actionable insights
results.summary()
# ┌─────────────────┬───────┬────────┐
# │ Metric          │ Score │ Status │
# ├─────────────────┼───────┼────────┤
# │ Retrieval P@10  │ 0.82  │ PASS   │
# │ Faithfulness    │ 0.74  │ WARN   │
# │ Relevance       │ 0.89  │ PASS   │
# │ Hallucination   │ 0.12  │ PASS   │
# └─────────────────┴───────┴────────┘

results.export("report.html")

v1.7.0 is now available! Alerting system with Webhook and Slack adapters. Install with pip install ragnarok-ai

Key Features

Feature	Description
100% Local	Runs entirely on your machine with Ollama. No OpenAI, no API keys, no data leaving your network.
Production Monitoring	Collect traces, export Prometheus metrics, track latency and success rates in production.
LLM-as-Judge	Multi-criteria evaluation with Prometheus 2: faithfulness, relevance, hallucination, completeness.
Cost Tracking	Track token usage and costs. Local models = $0.00, see exactly what cloud APIs cost.
Jupyter Integration	Rich HTML display in notebooks with metrics visualization.
Fast & Resilient	Built-in checkpointing — crash mid-evaluation? Resume exactly where you left off.
Framework Agnostic	Works with LangChain, LangGraph, LlamaIndex, or your custom RAG.
Comprehensive Metrics	Retrieval quality, faithfulness, relevance, hallucination detection, latency tracking.
Test Generation	Auto-generate diverse test sets from your knowledge base.
CI/CD Ready	CLI-first design, JSON output, exit codes for pipeline integration.
Enterprise Ready	Kubernetes Helm charts, air-gapped deployment, data sovereignty.
Lightweight	Minimal dependencies. No torch/transformers in core.

Comparison

Feature	ragnarok-ai	Giskard	RAGAS
100% Local	Yes	Partial	No
Checkpointing	Yes	No	No
Fast evaluation	Yes	No (45-60 min)	Yes
CLI support	Yes	No	No
LangChain integration	Yes	Yes	Yes
Minimal deps	Yes	No	Partial
Free & OSS	AGPL-3.0	Open-core	Apache-2.0

Performance

Benchmarked on Apple M2 16GB, Python 3.10:

Retrieval Metrics: ~24,000 queries/sec

Queries	Time	Peak RAM
50	0.002s	0.02 MB
500	0.021s	0.03 MB
5000	0.217s	0.17 MB

LLM-as-Judge (Prometheus 2):

Criterion	Avg Time
Faithfulness	~25s
Relevance	~22s
Hallucination	~28s

Retrieval is pure computation — instant. LLM-as-Judge is the bottleneck (~25s/eval), but runs 100% local.

Full benchmarks →

Quick Start

Try it now: Open in Google Colab

Prerequisites

Python 3.10+
Ollama running locally
uv (recommended) or pip

Install

pip install ragnarok-ai

With optional dependencies:

pip install ragnarok-ai[ollama,qdrant]

Run your first evaluation

# CLI demo
ragnarok evaluate --demo

# With options
ragnarok evaluate --demo --output results.json --fail-under 0.7

# Or in Python
python examples/basic_evaluation.py

Installation

Using pip

pip install ragnarok-ai

Optional dependencies

# LLM providers
pip install ragnarok-ai[ollama]      # Ollama support
pip install ragnarok-ai[openai]      # OpenAI support
pip install ragnarok-ai[anthropic]   # Anthropic support

# Vector stores
pip install ragnarok-ai[qdrant]      # Qdrant support
pip install ragnarok-ai[chroma]      # ChromaDB support
pip install ragnarok-ai[faiss]       # FAISS support

# RAG frameworks
pip install ragnarok-ai[langchain]   # LangChain/LangGraph support
pip install ragnarok-ai[llamaindex]  # LlamaIndex support
pip install ragnarok-ai[dspy]        # DSPy support

# Observability
pip install ragnarok-ai[telemetry]   # OpenTelemetry tracing

# Everything
pip install ragnarok-ai[all]

Development

git clone https://github.com/2501Pr0ject/RAGnarok-AI.git
cd RAGnarok-AI
pip install -e ".[dev]"
pre-commit install

Use Cases

Continuous RAG Testing in CI/CD

# .github/workflows/rag-tests.yml
- uses: 2501Pr0ject/ragnarok-evaluate-action@v1
  with:
    config: ragnarok.yaml
    threshold: 0.8
    # fail-on-threshold: false (default - advisory only)
    # comment-on-pr: true (default - posts PR comment)

The action posts a PR comment distinguishing deterministic retrieval metrics from advisory LLM-as-Judge scores.

Compare Embedding Models

configs = [
    {"embedder": "nomic-embed-text", "chunk_size": 512},
    {"embedder": "mxbai-embed-large", "chunk_size": 256},
]

results = await benchmark(
    rag_factory=create_rag,
    configs=configs,
    testset=testset,
)
results.compare()  # Side-by-side comparison

Monitor Production Quality

# Track quality drift over time
metrics = await evaluate(rag, production_queries)
metrics.log_to("./metrics/")  # Time-series storage

Metrics

Retrieval Metrics

Precision@K — Relevant docs in top K results
Recall@K — Coverage of relevant docs
MRR — Mean Reciprocal Rank
NDCG — Normalized Discounted Cumulative Gain

Generation Metrics

Faithfulness — Is the answer grounded in retrieved context?
Relevance — Does the answer address the question?
Hallucination — Does the answer contain fabricated info?
Completeness — Are all aspects of the question covered?

LLM-as-Judge (v1.2+)

Use Prometheus 2 for comprehensive, local evaluation:

from ragnarok_ai import LLMJudge

# Initialize judge (uses Prometheus 2 by default)
judge = LLMJudge()

# Evaluate a single response
result = await judge.evaluate_all(
    context="Python was created by Guido van Rossum in 1991.",
    question="Who created Python?",
    answer="Guido van Rossum created Python.",
)

print(f"Overall: {result.overall_verdict} ({result.overall_score:.2f})")
# Overall: PASS (0.85)

print(f"Faithfulness: {result.faithfulness.verdict}")
print(f"Hallucination: {result.hallucination.verdict}")

Performance:

~20-30s per evaluation on Apple M2 16GB
Prometheus 2 Q5_K_M: ~5GB RAM usage
keep_alive enabled by default (prevents model unloading between requests)

Installation:

# Install Prometheus 2 (~5GB, runs on 16GB RAM)
ollama pull hf.co/RichardErkhov/prometheus-eval_-_prometheus-7b-v2.0-gguf:Q5_K_M

Medical Mode

Reduce false positives in healthcare RAG evaluation with automatic medical abbreviation normalization.

from ragnarok_ai import LLMJudge

# Enable medical mode
judge = LLMJudge(medical_mode=True)

# "CHF" and "congestive heart failure" are now treated as equivalent
result = await judge.evaluate_faithfulness(
    context="Patient diagnosed with CHF.",
    question="What condition does the patient have?",
    answer="Patient has congestive heart failure.",
)
# Without medical_mode: may flag as unfaithful (text mismatch)
# With medical_mode: correctly identifies as faithful

Features:

350+ medical abbreviations (CHF, MI, COPD, DVT...)
Context-aware disambiguation (MS = multiple sclerosis vs mitral stenosis)
Multiple formats: dotted (q.d.), slash (s/p), mixed-case (SpO2)
False positive filtering (OR, US, IT stay unchanged)

Also works with FaithfulnessEvaluator(llm, medical_mode=True).

Contributed by @harish1120

System Metrics

Latency — End-to-end response time
Token usage — Cost tracking for LLM calls

Cost Tracking (v1.3+)

Track exactly what your evaluations cost:

results = await evaluate(rag, testset, track_cost=True)
print(results.cost)
# +--------------------+------------+----------+
# | Provider           |     Tokens |     Cost |
# +--------------------+------------+----------+
# | ollama (local)     |     45,230 |    $0.00 |
# | openai             |     12,500 |    $0.38 |
# +--------------------+------------+----------+

Local-first advantage: Ollama evaluations cost $0.00.

Jupyter Notebook (v1.3.1+)

Rich HTML display for evaluation results:

from ragnarok_ai.notebook import display, display_comparison

# Full dashboard with metrics, cost, latency
display(results)

# Compare multiple pipelines side-by-side
display_comparison([
    ("Baseline", baseline_results),
    ("Improved", improved_results),
])

Roadmap

Completed

v0.1 — Foundation

Project setup & architecture
Core retrieval metrics (precision, recall, MRR, NDCG)
Ollama adapter
Console reporter
JSON reporter
Basic CLI
CI/CD with GitHub Actions

v0.2 — Generation Metrics & Reporting

Qdrant adapter
Faithfulness evaluator
Relevance evaluator
Hallucination detection
HTML report with drill-down (failed questions, retrieved chunks)
Intelligent CI gating (stable metrics fail, LLM judgments warn)

v0.3 — Test Generation & Golden Sets

Synthetic question generation
Multi-hop question support
Adversarial question generation
Checkpointing system
Golden set support (human-validated, versioned question sets)
Baselines library (configs + expected results)
NovaTech example dataset for quickstart

v0.4 — Framework Adapters & Observability

LangChain integration
LangGraph integration
Custom RAG protocol support
OpenTelemetry export for tracing & debugging

v0.5 — Performance & Scale

Async parallelization (max_concurrency parameter)
Result caching (MemoryCache, DiskCache, CacheProtocol)
Batch processing (BatchEvaluator for 1000+ queries)
Progress callbacks (sync and async support)
Timeout and retry (timeout, max_retries, retry_delay)
Cache error handling (graceful degradation)

v0.6 — Cloud & Local Adapters

vLLM adapter (local high-performance inference)
OpenAI adapter (optional cloud fallback)
Anthropic adapter
ChromaDB adapter
FAISS adapter (pure local, no server)

v0.7 — Framework Adapters

LlamaIndex adapter (Retriever, QueryEngine, Index)
DSPy adapter (Retrieve, Module, RAG pattern)
Custom RAG support via RAGProtocol
Adapter contribution guide

v0.8 — Comparison & Benchmarking

Comparison mode (compare() for side-by-side evaluation)
Regression detection (alert on quality drop vs baseline)
Benchmark history tracking (time-series storage)
Diff reports (what changed between runs)

v0.9 — Agent Evaluation

AgentProtocol for agent pipelines
Tool-use correctness metrics (precision, recall, F1)
Multi-step reasoning evaluators (coherence, goal progress, efficiency)
ReAct/CoT pattern adapters
Trajectory analysis (loops, dead ends, failure detection)
Visualization (ASCII, Mermaid, HTML reports)

v1.0 — Production Ready

PyPI publish (pip install ragnarok-ai)
Stable public API
Complete README with examples
CHANGELOG.md (v0.1 → v1.0)

v1.1 — CLI Complete

ragnarok generate command (synthetic testset generation)
ragnarok benchmark command (history tracking, regression detection)
Standardized JSON envelope for --json output
E2E tests for CLI workflow
Trusted Publishing (PyPI OIDC)

v1.2 — LLM-as-Judge

LLMJudge class with Prometheus 2 integration
Multi-criteria evaluation (faithfulness, relevance, hallucination, completeness)
100% local evaluation with Ollama (Q5_K_M quantization, ~5GB)
Rubric-based prompts with 1-5 scoring normalized to 0-1
Detailed explanations for each judgment
Batch evaluation support
Robust JSON parsing for LLM responses (handles incomplete JSON)
keep_alive support for Ollama (prevents model unloading between requests)

v1.2.5 — Plugin Architecture

Plugin system based on Python entry points
PluginRegistry singleton for adapter discovery
Dynamic discovery of external plugins via importlib.metadata
ragnarok plugins CLI command (list, info, filters)
Support for 4 namespaces: llm, vectorstore, framework, evaluator
LOCAL/CLOUD classification for all adapters
Plugin documentation (docs/PLUGINS.md)
E2E plugin test with mock package

v1.3.0 — Cost Tracking

Cost tracking module (ragnarok_ai.cost)
Pricing table for OpenAI, Anthropic, Groq, Mistral, Together AI
Token counting with tiktoken (fallback to estimation)
CostTracker class with context manager support
track_cost=True parameter in evaluate()
Formatted summary table and JSON export
Local providers (Ollama, vLLM) = $0.00
Automatic tracking in LLM adapters

v1.3.1 — Jupyter Integration

Jupyter notebook module (ragnarok_ai.notebook)
Rich HTML display for evaluation results
Metrics visualization with progress bars
Cost breakdown tables
Pipeline comparison display
Auto-detection of notebook environment

v1.4.0 — More Integrations

LLM Adapters: Groq, Mistral, Together AI
VectorStore Adapters: Pinecone, Weaviate, Milvus, pgvector
Framework Adapters: Haystack, Semantic Kernel
Medical Mode: Abbreviation normalizer with 350+ terms (contributed by @harish1120)
CLI: ragnarok judge command, --config ragnarok.yaml support
Docs: MkDocs documentation site, performance benchmarks

v1.4.1 — Dataset Versioning

Shared hashing utilities: Canonical JSON, SHA256, content-based keys
TestSet versioning: schema_version, dataset_version, created_at, author, source
Dataset diff: ragnarok dataset diff CLI command
Stable item keys: metadata.id or content hash fallback
Diff report: Added/removed/modified/unchanged detection

v1.5.0 — Enterprise Deployment

Kubernetes Helm Chart: Job and CronJob modes, ConfigMap, PVC support
Air-Gapped Deployment Guide: Complete documentation for offline environments
Docker improvements: Fixed Dockerfile for proper module installation
Data Sovereignty: Full support for GDPR, HIPAA, defense environments

v1.6.0 — Production Monitoring

MonitorClient: Instrument RAG pipelines with configurable sampling
Monitor Daemon: HTTP server with /ingest, /metrics, /health, /stats
Prometheus Export: Request counts, success rate, latency percentiles
SQLite Storage: 7-day trace retention, 90-day aggregate retention
CLI Commands: ragnarok monitor start|stop|status|stats
PII Safety: Query hashing (SHA256)
Custom Metadata: Tenant/route slicing support

v1.7.0 — Alerting

AlertManager: Multi-channel alert dispatch with concurrent sending
AlertRule: Threshold-based rules with cooldown support
WebhookAlertAdapter: Generic HTTP webhook notifications
SlackAlertAdapter: Slack notifications with Block Kit formatting
Alert Protocols: Alert, AlertResult, AlertSeverity (INFO, WARNING, CRITICAL)
Public Exports: AlertManager, AlertRule, AlertSeverity from package root

Planned

v1.8+

More alert adapters (Discord, Email)
Drift detection
Web UI dashboard

Future

Web UI

Basic Web UI (read-only dashboard)
Full Web UI dashboard

Developer Experience

GitHub Action (2501Pr0ject/ragnarok-evaluate-action)
VS Code extension
Interactive CLI (TUI)
Rust acceleration for hot paths

Advanced Features

Streaming evaluation
A/B testing support
Dataset versioning
Fine-tuning recommendations
Multi-modal evaluation (images, audio)

Enterprise (On-Premise)

SSO support (SAML, OIDC)
Role-based access control
Audit logging
Air-gapped deployment guide
Docker/Kubernetes helm charts

Architecture

RAGnarok-AI Architecture

View project structure

ragnarok-ai/
├── src/ragnarok_ai/
│   ├── core/           # Types, protocols, exceptions
│   ├── evaluators/     # Metric implementations
│   ├── generators/     # Test set generation
│   ├── adapters/       # LLM, vector store, framework adapters
│   ├── reporters/      # Output formatters (JSON, HTML, console)
│   └── cli/            # Command-line interface
├── tests/              # Test suite (pytest)
├── examples/           # Usage examples
├── benchmarks/         # Performance benchmarks
└── docs/               # Documentation

Development

# Setup
uv pip install -e ".[dev]"
pre-commit install

# Run checks
pytest                    # Tests
pytest --cov=ragnarok_ai  # With coverage
ruff check . --fix        # Lint
ruff format .             # Format
mypy src/                 # Type check

Advanced Usage

Importing Types

For advanced use cases (custom RAG implementations, type hints), import types directly from submodules:

# Core types
from ragnarok_ai.core.types import Document, Query, RAGResponse, TestSet

# Protocols (for implementing custom adapters)
from ragnarok_ai.core.protocols import RAGProtocol, LLMProtocol, VectorStoreProtocol

# Evaluators
from ragnarok_ai.evaluators import FaithfulnessEvaluator, RelevanceEvaluator

# Adapters
from ragnarok_ai.adapters.llm import OllamaLLM, OpenAILLM
from ragnarok_ai.adapters.vectorstore import ChromaVectorStore, QdrantVectorStore

Implementing a Custom RAG

from ragnarok_ai.core.protocols import RAGProtocol
from ragnarok_ai.core.types import RAGResponse, Document

class MyCustomRAG:
    """Custom RAG implementing the RAGProtocol."""

    async def query(self, question: str, k: int = 5) -> RAGResponse:
        # Your retrieval logic here
        docs = await self.retrieve(question, k)
        answer = await self.generate(question, docs)

        return RAGResponse(
            answer=answer,
            retrieved_docs=[
                Document(id=d.id, content=d.text, metadata=d.meta)
                for d in docs
            ],
        )

# Use with ragnarok-ai
from ragnarok_ai import evaluate

results = await evaluate(
    rag_pipeline=MyCustomRAG(),
    testset=testset,
    metrics=["retrieval", "faithfulness"],
)

Feedback

Your feedback helps improve RAGnarok-AI. Pick the right channel:

Type	Link
Bug report	Report a bug
Feedback / UX	Share feedback
Feature request	Request a feature
Questions / Ideas	Discussions

Contributing

Contributions are welcome! Please read CONTRIBUTING.md for guidelines.

Priority areas for contributions:

Framework adapters (Haystack, Semantic Kernel)
Agent evaluation features
Streaming evaluation support
Multi-modal evaluation
Documentation & examples

License

RAGnarok-AI is dual-licensed:

License	Use Case
AGPL-3.0	Open source projects, personal use, research
Commercial	Proprietary software, SaaS, organizations with AGPL restrictions

Why dual licensing?

AGPL ensures improvements stay open-source
Commercial license enables enterprise adoption without copyleft obligations

For commercial licensing inquiries: abdel.touati@gmail.com

Acknowledgments

Built out of frustration with complex evaluation setups. We wanted something that just works — locally, fast, and without API keys.

_{Built with ❤️ in Lyon, France}

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

2501project

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

1.7.0

Feb 16, 2026

1.6.0

Feb 16, 2026

1.5.0

Feb 16, 2026

1.4.1

Feb 15, 2026

1.4.0

Feb 13, 2026

1.3.1

Feb 9, 2026

1.3.0

Feb 9, 2026

1.2.5

Feb 8, 2026

1.2.0

Feb 8, 2026

1.1.3

Feb 3, 2026

1.1.2

Jan 27, 2026

1.1.1

Jan 27, 2026

1.1.0

Jan 27, 2026

1.0.2

Jan 27, 2026

1.0.1

Jan 27, 2026

1.0.0

Jan 27, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ragnarok_ai-1.7.0.tar.gz (367.7 kB view details)

Uploaded Feb 16, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ragnarok_ai-1.7.0-py3-none-any.whl (309.3 kB view details)

Uploaded Feb 16, 2026 Python 3

File details

Details for the file ragnarok_ai-1.7.0.tar.gz.

File metadata

Download URL: ragnarok_ai-1.7.0.tar.gz
Upload date: Feb 16, 2026
Size: 367.7 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ragnarok_ai-1.7.0.tar.gz
Algorithm	Hash digest
SHA256	`840deb441da53ecec4c8a91ad045ffae5847832a06a246db9daa928d2a36c726`
MD5	`e7f1d0e98f70f9d7da99e17a67a9bdfd`
BLAKE2b-256	`546740546d9e779bbcfc3395548003d75d1b561fbf0ae2c1e3ead7e87238ae2b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ragnarok_ai-1.7.0.tar.gz:

Publisher: publish.yml on 2501Pr0ject/RAGnarok-AI

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ragnarok_ai-1.7.0.tar.gz
- Subject digest: 840deb441da53ecec4c8a91ad045ffae5847832a06a246db9daa928d2a36c726
- Sigstore transparency entry: 955139660
- Sigstore integration time: Feb 16, 2026
Source repository:
- Permalink: 2501Pr0ject/RAGnarok-AI@c651c282564dacb3ce6d9c65564f0cad8f34deaf
- Branch / Tag: refs/tags/v1.7.0
- Owner: https://github.com/2501Pr0ject
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@c651c282564dacb3ce6d9c65564f0cad8f34deaf
- Trigger Event: push

File details

Details for the file ragnarok_ai-1.7.0-py3-none-any.whl.

File metadata

Download URL: ragnarok_ai-1.7.0-py3-none-any.whl
Upload date: Feb 16, 2026
Size: 309.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ragnarok_ai-1.7.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e6efc74fc49ed6ff0c3d0b654e54693e5295cf920c777fb74b7eecb8ad1641e8`
MD5	`0d52cf3a248e0714cc069c6ff35804a2`
BLAKE2b-256	`5f2026590a9c410ffd552cb0f96d76fc85d0ef53a07cee0f21a8be8f9ffc785e`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ragnarok_ai-1.7.0-py3-none-any.whl:

Publisher: publish.yml on 2501Pr0ject/RAGnarok-AI

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ragnarok_ai-1.7.0-py3-none-any.whl
- Subject digest: e6efc74fc49ed6ff0c3d0b654e54693e5295cf920c777fb74b7eecb8ad1641e8
- Sigstore transparency entry: 955139663
- Sigstore integration time: Feb 16, 2026
Source repository:
- Permalink: 2501Pr0ject/RAGnarok-AI@c651c282564dacb3ce6d9c65564f0cad8f34deaf
- Branch / Tag: refs/tags/v1.7.0
- Owner: https://github.com/2501Pr0ject
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@c651c282564dacb3ce6d9c65564f0cad8f34deaf
- Trigger Event: push

ragnarok-ai 1.7.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Classifiers

Project description

The Problem

The Solution

Key Features

Comparison

Performance

Quick Start

Prerequisites

Install

Run your first evaluation

Installation

Using pip

Optional dependencies

Development

Use Cases

Continuous RAG Testing in CI/CD

Compare Embedding Models

Monitor Production Quality

Metrics

Retrieval Metrics

Generation Metrics

LLM-as-Judge (v1.2+)

Medical Mode

System Metrics

Cost Tracking (v1.3+)

Jupyter Notebook (v1.3.1+)

Roadmap

Completed

Planned

v1.8+

Future

Architecture

Development

Advanced Usage

Importing Types

Implementing a Custom RAG

Feedback

Contributing

License

Acknowledgments

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance