ragscore

Generate QA datasets & evaluate RAG systems in 2 commands. Privacy-first, works with any LLM (local or cloud). Async, fast, zero config.

These details have not been verified by PyPI

Project description

Generate QA datasets & evaluate RAG systems in 2 commands

🔒 Privacy-First • ⚡ Async & Fast • 🤖 Any LLM • 🏠 Local or Cloud

English | 中文 | 日本語

⚡ 2-Line RAG Evaluation

# Step 1: Generate QA pairs from your docs
ragscore generate docs/

# Step 2: Evaluate your RAG system
ragscore evaluate http://localhost:8000/query

That's it. Get accuracy scores and incorrect QA pairs instantly.

============================================================
✅ EXCELLENT: 85/100 correct (85.0%)
Average Score: 4.20/5.0
============================================================

❌ 15 Incorrect Pairs:

  1. Q: "What is RAG?"
     Score: 2/5 - Factually incorrect

  2. Q: "How does retrieval work?"
     Score: 3/5 - Incomplete answer

🚀 Quick Start

Install

pip install ragscore              # Core (works with Ollama)
pip install "ragscore[openai]"    # + OpenAI support
pip install "ragscore[notebook]"  # + Jupyter/Colab support
pip install "ragscore[all]"       # + All providers

Option 1: Python API (Notebook-Friendly)

Perfect for Jupyter, Colab, and rapid iteration. Get instant visualizations.

from ragscore import quick_test

# 1. Audit your RAG in one line
result = quick_test(
    endpoint="http://localhost:8000/query",  # Your RAG API
    docs="docs/",                            # Your documents
    n=10,                                    # Number of test questions
)

# 2. See the report
result.plot()

# 3. Inspect failures
bad_rows = result.df[result.df['score'] < 3]
display(bad_rows[['question', 'rag_answer', 'reason']])

Rich Object API:

result.accuracy - Accuracy score
result.df - Pandas DataFrame of all results
result.plot() - 3-panel visualization
result.corrections - List of items to fix

Option 2: CLI (Production)

Generate QA Pairs

# Set API key (or use local Ollama - no key needed!)
export OPENAI_API_KEY="sk-..."

# Generate from any document
ragscore generate paper.pdf
ragscore generate docs/*.pdf --concurrency 10

Evaluate Your RAG

# Point to your RAG endpoint
ragscore evaluate http://localhost:8000/query

# Custom options
ragscore evaluate http://api/ask --model gpt-4o --output results.json

🏠 100% Private with Local LLMs

# Use Ollama - no API keys, no cloud, 100% private
ollama pull llama3.1
ragscore generate confidential_docs/*.pdf
ragscore evaluate http://localhost:8000/query

Perfect for: Healthcare 🏥 • Legal ⚖️ • Finance 🏦 • Research 🔬

🔌 Supported LLMs

Provider	Setup	Notes
Ollama	`ollama serve`	Local, free, private
OpenAI	`export OPENAI_API_KEY="sk-..."`	Best quality
Anthropic	`export ANTHROPIC_API_KEY="..."`	Long context
DashScope	`export DASHSCOPE_API_KEY="..."`	Qwen models
vLLM	`export LLM_BASE_URL="..."`	Production-grade
Any OpenAI-compatible	`export LLM_BASE_URL="..."`	Groq, Together, etc.

📊 Output Formats

Generated QA Pairs (`output/generated_qas.jsonl`)

{
  "id": "abc123",
  "question": "What is RAG?",
  "answer": "RAG (Retrieval-Augmented Generation) combines...",
  "rationale": "This is explicitly stated in the introduction...",
  "support_span": "RAG systems retrieve relevant documents...",
  "difficulty": "medium",
  "source_path": "docs/rag_intro.pdf"
}

Evaluation Results (`--output results.json`)

{
  "summary": {
    "total": 100,
    "correct": 85,
    "incorrect": 15,
    "accuracy": 0.85,
    "avg_score": 4.2
  },
  "incorrect_pairs": [
    {
      "question": "What is RAG?",
      "golden_answer": "RAG combines retrieval with generation...",
      "rag_answer": "RAG is a database system.",
      "score": 2,
      "reason": "Factually incorrect - RAG is not a database"
    }
  ]
}

🧪 Python API

from ragscore import run_pipeline, run_evaluation

# Generate QA pairs
run_pipeline(paths=["docs/"], concurrency=10)

# Evaluate RAG
results = run_evaluation(
    endpoint="http://localhost:8000/query",
    model="gpt-4o",  # LLM for judging
)
print(f"Accuracy: {results.accuracy:.1%}")

🤖 AI Agent Integration

RAGScore is designed for AI agents and automation:

# Structured CLI with predictable output
ragscore generate docs/ --concurrency 5
ragscore evaluate http://api/query --output results.json

# Exit codes: 0 = success, 1 = error
# JSON output for programmatic parsing

CLI Reference:

Command	Description
`ragscore generate <paths>`	Generate QA pairs from documents
`ragscore evaluate <endpoint>`	Evaluate RAG against golden QAs
`ragscore --help`	Show all commands and options
`ragscore generate --help`	Show generate options
`ragscore evaluate --help`	Show evaluate options

⚙️ Configuration

Zero config required. Optional environment variables:

export RAGSCORE_CHUNK_SIZE=512          # Chunk size for documents
export RAGSCORE_QUESTIONS_PER_CHUNK=5   # QAs per chunk
export RAGSCORE_WORK_DIR=/path/to/dir   # Working directory

🔐 Privacy & Security

Data	Cloud LLM	Local LLM
Documents	✅ Local	✅ Local
Text chunks	⚠️ Sent to LLM	✅ Local
Generated QAs	✅ Local	✅ Local
Evaluation results	✅ Local	✅ Local

Compliance: GDPR ✅ • HIPAA ✅ (with local LLMs) • SOC 2 ✅

🧪 Development

git clone https://github.com/HZYAI/RagScore.git
cd RagScore
pip install -e ".[dev,all]"
pytest

🔗 Links

GitHub • PyPI • Issues • Discussions

⭐ Star us on GitHub if RAGScore helps you!
Made with ❤️ for the RAG community

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.8.2

Apr 11, 2026

0.8.0

Mar 13, 2026

0.7.8

Mar 12, 2026

0.7.7

Feb 27, 2026

0.7.6

Feb 27, 2026

0.7.5

Feb 27, 2026

0.7.4

Feb 21, 2026

0.7.3

Feb 19, 2026

0.7.2

Feb 15, 2026

0.7.1

Feb 15, 2026

0.7.0

Feb 15, 2026

0.6.10

Feb 11, 2026

0.6.9

Feb 11, 2026

0.6.8

Feb 11, 2026

0.6.7

Feb 9, 2026

0.6.6

Feb 7, 2026

0.6.5

Feb 7, 2026

0.6.4

Feb 7, 2026

0.6.3

Jan 25, 2026

0.6.2

Jan 25, 2026

This version

0.6.1

Jan 24, 2026

0.6.0

Jan 24, 2026

0.5.2

Jan 23, 2026

0.5.1

Jan 21, 2026

0.4.7

Jan 4, 2026

0.4.6

Jan 4, 2026

0.4.4

Jan 3, 2026

0.4.3

Dec 28, 2025

0.4.0

Dec 27, 2025

0.3.1

Dec 27, 2025

0.2.2

Dec 27, 2025

0.1.4

Dec 27, 2025

0.1.3

Dec 27, 2025

0.1.2

Dec 26, 2025

0.1.1

Dec 26, 2025

0.1.0

Dec 26, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ragscore-0.6.1.tar.gz (49.2 kB view details)

Uploaded Jan 24, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ragscore-0.6.1-py3-none-any.whl (50.9 kB view details)

Uploaded Jan 24, 2026 Python 3

File details

Details for the file ragscore-0.6.1.tar.gz.

File metadata

Download URL: ragscore-0.6.1.tar.gz
Upload date: Jan 24, 2026
Size: 49.2 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ragscore-0.6.1.tar.gz
Algorithm	Hash digest
SHA256	`46d6d4f0447fcd0e0a1b119ce8dc21c75075ef1a30235bac48275970d594871f`
MD5	`00952e3a05d47b68aca9fd7472460ed3`
BLAKE2b-256	`0ab4699b2440b0f8636d9ae07620cfa6675b85aa3e762e7ebbf920109e1e0361`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ragscore-0.6.1.tar.gz:

Publisher: ci.yml on HZYAI/RagScore

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ragscore-0.6.1.tar.gz
- Subject digest: 46d6d4f0447fcd0e0a1b119ce8dc21c75075ef1a30235bac48275970d594871f
- Sigstore transparency entry: 849834871
- Sigstore integration time: Jan 24, 2026
Source repository:
- Permalink: HZYAI/RagScore@24f3340126a3d2647d1fe2d949be14e6ef478d18
- Branch / Tag: refs/tags/v0.6.1
- Owner: https://github.com/HZYAI
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: ci.yml@24f3340126a3d2647d1fe2d949be14e6ef478d18
- Trigger Event: push

File details

Details for the file ragscore-0.6.1-py3-none-any.whl.

File metadata

Download URL: ragscore-0.6.1-py3-none-any.whl
Upload date: Jan 24, 2026
Size: 50.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ragscore-0.6.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f5dbc5a4ad6dce330bd47c586b7f4aaeb16a750f5db32030d50ae8f9b1a74c9b`
MD5	`ccd76c0b5d49022bd62ad50c25629139`
BLAKE2b-256	`ffebd09429e8fc3b4c0a15438b7370ff34e6925433f2b43f264038191e850997`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ragscore-0.6.1-py3-none-any.whl:

Publisher: ci.yml on HZYAI/RagScore

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ragscore-0.6.1-py3-none-any.whl
- Subject digest: f5dbc5a4ad6dce330bd47c586b7f4aaeb16a750f5db32030d50ae8f9b1a74c9b
- Sigstore transparency entry: 849834872
- Sigstore integration time: Jan 24, 2026
Source repository:
- Permalink: HZYAI/RagScore@24f3340126a3d2647d1fe2d949be14e6ef478d18
- Branch / Tag: refs/tags/v0.6.1
- Owner: https://github.com/HZYAI
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: ci.yml@24f3340126a3d2647d1fe2d949be14e6ef478d18
- Trigger Event: push

ragscore 0.6.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

⚡ 2-Line RAG Evaluation

🚀 Quick Start

Install

Option 1: Python API (Notebook-Friendly)

Option 2: CLI (Production)

Generate QA Pairs

Evaluate Your RAG

🏠 100% Private with Local LLMs

🔌 Supported LLMs

📊 Output Formats

Generated QA Pairs (output/generated_qas.jsonl)

Evaluation Results (--output results.json)

🧪 Python API

🤖 AI Agent Integration

⚙️ Configuration

🔐 Privacy & Security

🧪 Development

🔗 Links

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

Generated QA Pairs (`output/generated_qas.jsonl`)

Evaluation Results (`--output results.json`)