ragpill

This library allows for granular testing of llm-applications based on expert input.

Project description

ragpill logo

Stop believing your chatbot. Take the ragpill.

basedpyright

ragpill is an evaluation framework for LLM agents and RAG pipelines. Define facts, sources, and tool call expectations — and find out what your AI actually does.

What is RAGPill?

RAGPill helps you:

Create test datasets from CSV files - Easy collaboration with domain experts
Define custom evaluators - Add domain-specific knowledge to evaluations
Track results in MLflow - Full experiment tracking and tracing
Follow best practices - Opinionated design guides you to robust testing

It specializes in "offline" evaluation of LLM-based systems, meaning it's supposed to be part of your CI/CD pipeline or run as scheduled tests, not real-time monitoring.

Core Philosophy

Here we focus a lot on the LLM Judge evaluator, although it's the last evaluator you should use - prefer deterministic evaluators (regex, exact match) whenever possible. However, for deterministic tests, there's already a lot of tooling available, like pytest for example (yes, we like the 'code-first' approach).

Expert-Defined Attributes

LLM judges usually lack context awareness to judge which discrepancies between chatbot answers and expected answers are relevant - especially in specialized fields like law, engineering, and science where words have precise definitions.

Domain experts should define specific attributes and criteria for evaluation.

Binary Evaluations

We use boolean pass/fail values only, not scoring scales (1-10), because:

Scales are arbitrary and often decided by LLMs
Binary decisions are more stable and reproducible (although LLMs of course remain probabilistic)
Easier to track and reason about over time

Tags and Attributes for Organization

Evaluators can have:

Tags: Categorical labels for filtering (e.g., retrieval, time-aware-rag, basic_logic)
Attributes: Key-value metadata for categorization (e.g., importance: high, scope: Phase1)

Metrics are automatically calculated per tag and attribute.

Quick Navigation

Getting Started:

Evaluators:

Key Concepts

As this library is built on pydantic-ai evals, please have a look here

Key Components

Dataset: From pydantic-ai, contains test cases with inputs, evaluators, and metadata
Evaluators: Check if outputs meet criteria (LLMJudge, regex matchers, custom evaluators)
MLflow Integration: Wraps execution, traces runs, evaluates outputs, uploads results

Features

Great MLflow Integration: Traces your agent/function execution to MLflow with evaluations in the native format
CSV/Excel Adapter: Load test cases from CSV files with evaluator configurations
Flexible Evaluators: Built-in LLM judges, regex matchers, and easy custom evaluator creation
Metrics per Tags/Attributes: Automatic metric calculation for each tag and attribute combination
Type Safety: Built on pydantic-ai with full type safety throughout

Built-in Evaluators

LLMJudge: Uses an LLM to judge correctness based on a rubric
RegexInSourcesEvaluator: Checks if regex patterns appear in retrieved sources
RegexInDocumentMetadataEvaluator: Checks regex in document metadata
Custom Evaluators: Inherit from BaseEvaluator and implement your logic

Best Practices

[!TIP] TDD Mindset — Begin with defining a Test-Set with potential users before even starting to develop the solution. This enables clear expectation management and progress tracking.

[!TIP] Create Multiple Testsets — It might make sense for you to have some core tests that run relatively quickly and inexpensive - use these for development. Before deploying to prod, you can run an exhaustive dataset that is integrated in your CI/CD.

[!TIP] Separate Evaluation Experiments — Create dedicated MLflow experiments for evaluations. Don't mix evaluation traces with production traces.

[!TIP] Use Domain Experts — Have domain experts define evaluation criteria rather than relying solely on generic LLM judges.

[!TIP] Version Your Tests — Keep test datasets in version control alongside your code.

Documentation

Full documentation is available at joelgotsch.github.io/ragpill/latest including:

Installation Guide: Setup instructions
Quickstart Tutorial: Run your first evaluation
CSV Adapter Guide: Learn the CSV format and column meanings
Evaluators Guide: Create custom evaluators
MLflow Integration: Advanced MLflow usage
API Reference: Complete API documentation

Roadmap

Adapter for testset from CSV
Documentation via mkdocs
Evaluators for sources and regex
Repeat Task Evaluations (run task multiple times and evaluate with threshold)
Adapter for task from CSV (upload to mlflow)
Create demo video
CI/CD (tests, build package, publish docs)
Global evaluators from CSV (empty input)
Track git-commit hash in experiment
Tests with mlflow server
Dependency injection for llm, input_to_key functions
pytest integration

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

Project details

Release history Release notifications | RSS feed

This version

0.1.0

Apr 15, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ragpill-0.1.0.tar.gz (32.1 kB view details)

Uploaded Apr 15, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ragpill-0.1.0-py3-none-any.whl (35.6 kB view details)

Uploaded Apr 15, 2026 Python 3

File details

Details for the file ragpill-0.1.0.tar.gz.

File metadata

Download URL: ragpill-0.1.0.tar.gz
Upload date: Apr 15, 2026
Size: 32.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ragpill-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`d514f4fcb69a776b409b80efe9ee73893366a3dfa762e9cb24b41d6b7c50af00`
MD5	`170484de71e5237e23cb64104951a2e1`
BLAKE2b-256	`b60fe2ee78714e644ae4616d23a100c16c9874fffe0d064a785e5308267801cd`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ragpill-0.1.0.tar.gz:

Publisher: publish.yml on JoelGotsch/ragpill

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ragpill-0.1.0.tar.gz
- Subject digest: d514f4fcb69a776b409b80efe9ee73893366a3dfa762e9cb24b41d6b7c50af00
- Sigstore transparency entry: 1305161947
- Sigstore integration time: Apr 15, 2026
Source repository:
- Permalink: JoelGotsch/ragpill@cd2a2fde3fefcb24ccc24509b70eb484c15e51ac
- Branch / Tag: refs/heads/main
- Owner: https://github.com/JoelGotsch
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@cd2a2fde3fefcb24ccc24509b70eb484c15e51ac
- Trigger Event: workflow_dispatch

File details

Details for the file ragpill-0.1.0-py3-none-any.whl.

File metadata

Download URL: ragpill-0.1.0-py3-none-any.whl
Upload date: Apr 15, 2026
Size: 35.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ragpill-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b29417c1292b9e6f4e16e7acaff702f1bf1ee0dfca6d9a9641255860d25e3298`
MD5	`b87b21ad4a708c619c63606ff1adc2d1`
BLAKE2b-256	`c304fbc30293310eda669b1bee7abcdfe9af49d494e589d2b36398bfd4234b39`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ragpill-0.1.0-py3-none-any.whl:

Publisher: publish.yml on JoelGotsch/ragpill

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ragpill-0.1.0-py3-none-any.whl
- Subject digest: b29417c1292b9e6f4e16e7acaff702f1bf1ee0dfca6d9a9641255860d25e3298
- Sigstore transparency entry: 1305162025
- Sigstore integration time: Apr 15, 2026
Source repository:
- Permalink: JoelGotsch/ragpill@cd2a2fde3fefcb24ccc24509b70eb484c15e51ac
- Branch / Tag: refs/heads/main
- Owner: https://github.com/JoelGotsch
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@cd2a2fde3fefcb24ccc24509b70eb484c15e51ac
- Trigger Event: workflow_dispatch

ragpill 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

What is RAGPill?

Core Philosophy

Expert-Defined Attributes

Binary Evaluations

Tags and Attributes for Organization

Quick Navigation

Getting Started:

Evaluators:

Key Concepts

Key Components

Features

Built-in Evaluators

Best Practices

Documentation

Roadmap

Contributing

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance