Generate high-quality Rubrics based on custom dimensions, descriptors, criteria and scoring system.
Project description
Rubric Kit
Rubric framework. Create, refine, and apply evaluation rubrics powered by AI.
Features
- Rubric Generation - Create rubrics from Q&A pairs or chat sessions
- Multi-Judge Panel - Multiple LLMs with consensus mechanisms (quorum, majority, unanimous)
- Multi-Provider Support - OpenAI, Google Vertex AI, IBM WatsonX, Anthropic, Ollama, and 100+ providers via LiteLLM
- Flexible Grading - Binary (pass/fail) and score-based (0-3 scale) grading
- Tool Call Validation - Define required, optional, and prohibited tool calls
- PDF Reports - Comprehensive reports with charts and breakdowns
- Export Formats - YAML (source of truth), PDF, CSV, JSON
- Self-Contained Outputs - Re-run evaluations from previous results
- Cost Estimation - Dry-run mode to estimate costs before running evaluations
- Metrics Tracking - Token usage, latency, and cost tracking per LLM call
Installation
Requires Python 3.10 or higher.
pip install rubric-kit
For development:
git clone https://github.com/your-org/rubric-kit
cd rubric-kit
pip install -e ".[dev]"
Quick Start
export OPENAI_API_KEY="your-api-key"
# Generate a rubric from Q&A
rubric-kit generate qa_input.txt rubric.yaml
# Evaluate a chat session
rubric-kit evaluate --from-chat-session chat.txt --rubric-file rubric.yaml --output-file results.yaml
# Export to PDF
rubric-kit export results.yaml --format pdf --output report.pdf
LLM Provider Setup
Rubric Kit uses LiteLLM to support 100+ LLM providers. API keys are configured via environment variables (never in config files or CLI arguments).
Some Supported Providers (most popular)
| Provider | Model Format | Environment Variables |
|---|---|---|
| OpenAI | gpt-4, gpt-4o |
OPENAI_API_KEY |
| Google AI Studio | gemini/gemini-2.5-flash |
GEMINI_API_KEY |
| Google Vertex AI | vertex_ai/gemini-2.5-flash |
gcloud auth or GOOGLE_APPLICATION_CREDENTIALS |
| IBM WatsonX | watsonx/meta-llama/llama-3-8b-instruct |
WATSONX_APIKEY, WATSONX_PROJECT_ID |
| Anthropic | claude-3-5-sonnet-20241022 |
ANTHROPIC_API_KEY |
| Ollama (local) | ollama/llama3.1 |
None (uses localhost:11434) |
| Ollama (remote) | ollama/granite4 |
OLLAMA_API_BASE |
| Azure OpenAI | azure/gpt-4 |
AZURE_API_KEY, AZURE_API_BASE |
📚 Full documentation: See LiteLLM Providers for complete list of supported providers, model formats, and required environment variables.
CLI Examples
# OpenAI
export OPENAI_API_KEY="sk-..."
rubric-kit generate --from-qna qna.yaml --output-file rubric.yaml --model gpt-4o
# Google AI Studio (Gemini)
export GEMINI_API_KEY="..."
rubric-kit generate --from-qna qna.yaml --output-file rubric.yaml --model gemini/gemini-2.5-flash
# Google Vertex AI
gcloud auth application-default login
rubric-kit generate --from-qna qna.yaml --output-file rubric.yaml --model vertex_ai/gemini-2.5-flash
# IBM WatsonX
export WATSONX_APIKEY="..."
export WATSONX_PROJECT_ID="..."
rubric-kit generate --from-qna qna.yaml --output-file rubric.yaml --model watsonx/meta-llama/llama-3-8b-instruct
# Local Ollama
rubric-kit generate --from-qna qna.yaml --output-file rubric.yaml --model ollama/llama3
OpenAI-Compatible Endpoints
For custom OpenAI-compatible endpoints (vLLM, LocalAI, etc.), use base_url:
judges:
- name: custom-endpoint
model: mistral-7b # Model name as expected by your endpoint
base_url: http://your-endpoint:8000/v1
Commands
| Command | Description |
|---|---|
generate |
Create a rubric from Q&A pair or chat session |
evaluate |
Evaluate content against a rubric (outputs YAML) |
refine |
Improve an existing rubric with AI feedback |
export |
Convert YAML to PDF, CSV, or JSON |
rerun |
Re-evaluate using settings from previous output |
arena |
Compare multiple contestants against same rubric |
Use rubric-kit <command> --help for detailed options.
Common Options
| Flag | Commands | Description |
|---|---|---|
--dry-run |
evaluate, generate | Estimate costs without making LLM calls |
--no-metrics |
evaluate, generate, refine | Disable metrics collection in output |
--include-call-log |
evaluate | Include detailed per-call metrics in output |
YAML Formats
See examples/ for complete format examples:
rubric.example.yaml- Rubric with dimensions and criteriajudge_panel.example.yaml- Multi-judge configurationdimensions.example.yaml- Predefined dimensionsarena.example.yaml- Arena competition spec
Rubric Structure
Below you will find a very basic rubric to understand how it is composed.
TODO: For Rubrics best practices, please read the RUBRICS file.
dimensions:
- factual_correctness: Evaluates factual accuracy
grading_type: binary
- quality: Evaluates response quality
grading_type: score
scores:
0: Poor
1: Fair
2: Good
3: Excellent
criteria:
accuracy_check:
category: Output
weight: 3
dimension: factual_correctness
criterion: Response must correctly state X.
tool_usage:
category: Tools
weight: 2
dimension: tool_use
tool_calls:
respect_order: false
required:
- get_info:
min_calls: 1
Judge Panel
judge_panel:
judges:
- name: ChatGPT-4o
model: gpt-4o
- name: Gemini-2.5-Flash
model: vertex_ai/gemini-2.5-flash
- name: Claude-4.5-Sonnet
model: anthropic/claude-4-5-sonnet
execution:
mode: parallel # sequential, parallel, batched
consensus:
mode: majority # unanimous, majority, quorum
on_no_consensus: fail # fail, median, most_common
Output
Evaluation always produces a self-contained YAML file:
results:
- criterion_name: accuracy_check
result: pass
score: 3
reason: The response correctly stated X.
summary:
total_score: 15
max_score: 18
percentage: 83.3
rubric: { ... } # Full rubric for reference
judge_panel: { ... } # Judge configuration used
input: { ... } # Input content (Q&A or chat session)
metadata:
timestamp: 2025-01-20T10:30:00
metrics: # LLM usage metrics (unless --no-metrics)
summary:
total_calls: 5
prompt_tokens: 2500
completion_tokens: 800
estimated_cost_usd: 0.0425
Development
# Run tests
pytest
# Run with coverage
pytest --cov=rubric_kit --cov-report=html
# Format code
black rubric_kit tests
License
See LICENSE file.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rubric_kit-0.2.0.tar.gz.
File metadata
- Download URL: rubric_kit-0.2.0.tar.gz
- Upload date:
- Size: 153.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8f3af808af6c86872a580bb26ff55016bfa404c0147f6271312949bf5fd63756
|
|
| MD5 |
2c0249dcfafd15e392ac2fe8e52dd23b
|
|
| BLAKE2b-256 |
c546dacbb528d349c06617aac5a897fe701d8354fa6331a438d5068c65b53ddc
|
Provenance
The following attestation bundles were made for rubric_kit-0.2.0.tar.gz:
Publisher:
release.yaml on narmaku/rubric-kit
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
rubric_kit-0.2.0.tar.gz -
Subject digest:
8f3af808af6c86872a580bb26ff55016bfa404c0147f6271312949bf5fd63756 - Sigstore transparency entry: 1008346505
- Sigstore integration time:
-
Permalink:
narmaku/rubric-kit@f8d9ce39298a33cb0783dbf509e741e691262a20 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/narmaku
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yaml@f8d9ce39298a33cb0783dbf509e741e691262a20 -
Trigger Event:
push
-
Statement type:
File details
Details for the file rubric_kit-0.2.0-py3-none-any.whl.
File metadata
- Download URL: rubric_kit-0.2.0-py3-none-any.whl
- Upload date:
- Size: 105.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
94d8d77ee57e83dd8bbf0dc33bb198f84e55a1816b0002be1df1b26f1b70c827
|
|
| MD5 |
7150abb44ba8ac76f31184c65fcd6668
|
|
| BLAKE2b-256 |
10420dbd0174a1926144d9cd4ed25aadd6af940a856d5163b1e667a1ce0aa610
|
Provenance
The following attestation bundles were made for rubric_kit-0.2.0-py3-none-any.whl:
Publisher:
release.yaml on narmaku/rubric-kit
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
rubric_kit-0.2.0-py3-none-any.whl -
Subject digest:
94d8d77ee57e83dd8bbf0dc33bb198f84e55a1816b0002be1df1b26f1b70c827 - Sigstore transparency entry: 1008346508
- Sigstore integration time:
-
Permalink:
narmaku/rubric-kit@f8d9ce39298a33cb0783dbf509e741e691262a20 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/narmaku
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yaml@f8d9ce39298a33cb0783dbf509e741e691262a20 -
Trigger Event:
push
-
Statement type: