Skip to main content

Generate QA datasets & evaluate RAG systems in 2 commands. Privacy-first, works with any LLM (local or cloud). Async, fast, zero config.

Project description

RAGScore Logo

PyPI version PyPI Downloads Python 3.9+ License Ollama

Generate QA datasets & evaluate RAG systems in 2 commands

🔒 Privacy-First • ⚡ Async & Fast • 🤖 Any LLM • 🏠 Local or Cloud

English | 中文 | 日本語


⚡ 2-Line RAG Evaluation

# Step 1: Generate QA pairs from your docs
ragscore generate docs/

# Step 2: Evaluate your RAG system
ragscore evaluate http://localhost:8000/query

That's it. Get accuracy scores and incorrect QA pairs instantly.

============================================================
✅ EXCELLENT: 85/100 correct (85.0%)
Average Score: 4.20/5.0
============================================================

❌ 15 Incorrect Pairs:

  1. Q: "What is RAG?"
     Score: 2/5 - Factually incorrect

  2. Q: "How does retrieval work?"
     Score: 3/5 - Incomplete answer

🚀 Quick Start

Install

pip install ragscore              # Core (works with Ollama)
pip install "ragscore[openai]"    # + OpenAI support
pip install "ragscore[all]"       # + All providers

Generate QA Pairs

# Set API key (or use local Ollama - no key needed!)
export OPENAI_API_KEY="sk-..."

# Generate from any document
ragscore generate paper.pdf
ragscore generate docs/*.pdf --concurrency 10

Evaluate Your RAG

# Point to your RAG endpoint
ragscore evaluate http://localhost:8000/query

# Custom options
ragscore evaluate http://api/ask --model gpt-4o --output results.json

🏠 100% Private with Local LLMs

# Use Ollama - no API keys, no cloud, 100% private
ollama pull llama3.1
ragscore generate confidential_docs/*.pdf
ragscore evaluate http://localhost:8000/query

Perfect for: Healthcare 🏥 • Legal ⚖️ • Finance 🏦 • Research 🔬


🔌 Supported LLMs

Provider Setup Notes
Ollama ollama serve Local, free, private
OpenAI export OPENAI_API_KEY="sk-..." Best quality
Anthropic export ANTHROPIC_API_KEY="..." Long context
DashScope export DASHSCOPE_API_KEY="..." Qwen models
vLLM export LLM_BASE_URL="..." Production-grade
Any OpenAI-compatible export LLM_BASE_URL="..." Groq, Together, etc.

📊 Output Formats

Generated QA Pairs (output/generated_qas.jsonl)

{
  "id": "abc123",
  "question": "What is RAG?",
  "answer": "RAG (Retrieval-Augmented Generation) combines...",
  "rationale": "This is explicitly stated in the introduction...",
  "support_span": "RAG systems retrieve relevant documents...",
  "difficulty": "medium",
  "source_path": "docs/rag_intro.pdf"
}

Evaluation Results (--output results.json)

{
  "summary": {
    "total": 100,
    "correct": 85,
    "incorrect": 15,
    "accuracy": 0.85,
    "avg_score": 4.2
  },
  "incorrect_pairs": [
    {
      "question": "What is RAG?",
      "golden_answer": "RAG combines retrieval with generation...",
      "rag_answer": "RAG is a database system.",
      "score": 2,
      "reason": "Factually incorrect - RAG is not a database"
    }
  ]
}

🧪 Python API

from ragscore import run_pipeline, run_evaluation

# Generate QA pairs
run_pipeline(paths=["docs/"], concurrency=10)

# Evaluate RAG
results = run_evaluation(
    endpoint="http://localhost:8000/query",
    model="gpt-4o",  # LLM for judging
)
print(f"Accuracy: {results.accuracy:.1%}")

🤖 AI Agent Integration

RAGScore is designed for AI agents and automation:

# Structured CLI with predictable output
ragscore generate docs/ --concurrency 5
ragscore evaluate http://api/query --output results.json

# Exit codes: 0 = success, 1 = error
# JSON output for programmatic parsing

CLI Reference:

Command Description
ragscore generate <paths> Generate QA pairs from documents
ragscore evaluate <endpoint> Evaluate RAG against golden QAs
ragscore --help Show all commands and options
ragscore generate --help Show generate options
ragscore evaluate --help Show evaluate options

⚙️ Configuration

Zero config required. Optional environment variables:

export RAGSCORE_CHUNK_SIZE=512          # Chunk size for documents
export RAGSCORE_QUESTIONS_PER_CHUNK=5   # QAs per chunk
export RAGSCORE_WORK_DIR=/path/to/dir   # Working directory

🔐 Privacy & Security

Data Cloud LLM Local LLM
Documents ✅ Local ✅ Local
Text chunks ⚠️ Sent to LLM ✅ Local
Generated QAs ✅ Local ✅ Local
Evaluation results ✅ Local ✅ Local

Compliance: GDPR ✅ • HIPAA ✅ (with local LLMs) • SOC 2 ✅


🧪 Development

git clone https://github.com/HZYAI/RagScore.git
cd RagScore
pip install -e ".[dev,all]"
pytest

🔗 Links


⭐ Star us on GitHub if RAGScore helps you!
Made with ❤️ for the RAG community

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ragscore-0.6.0.tar.gz (48.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ragscore-0.6.0-py3-none-any.whl (50.6 kB view details)

Uploaded Python 3

File details

Details for the file ragscore-0.6.0.tar.gz.

File metadata

  • Download URL: ragscore-0.6.0.tar.gz
  • Upload date:
  • Size: 48.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ragscore-0.6.0.tar.gz
Algorithm Hash digest
SHA256 d37b2279f8e5d6b38c45b67415b959891917bf3535a2f36e71952725855d281f
MD5 89d3dd3db122d7053ef7214a20e79581
BLAKE2b-256 3c18699168f3e90e832dfa0edeb525fa6947f12755a47f0d3d547ec20ccc9fb0

See more details on using hashes here.

Provenance

The following attestation bundles were made for ragscore-0.6.0.tar.gz:

Publisher: ci.yml on HZYAI/RagScore

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ragscore-0.6.0-py3-none-any.whl.

File metadata

  • Download URL: ragscore-0.6.0-py3-none-any.whl
  • Upload date:
  • Size: 50.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ragscore-0.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a2515de0ca990eea02d325cfb77bf47e3ddc8e294d7d705364953851c627e107
MD5 df8499c26d78473922e2a15462eb8a58
BLAKE2b-256 d81f7483aa530059fafec9888e4428a1421301c6339b47fd3a697c3d09964a88

See more details on using hashes here.

Provenance

The following attestation bundles were made for ragscore-0.6.0-py3-none-any.whl:

Publisher: ci.yml on HZYAI/RagScore

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page