Generate high-quality Rubrics based on custom dimensions, descriptors, criteria and scoring system.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

narmaku

These details have not been verified by PyPI

Project description

Rubric Kit

Rubric framework. Create, refine, and apply evaluation rubrics powered by AI.

Features

Rubric Generation - Create rubrics from Q&A pairs or chat sessions
Multi-Judge Panel - Multiple LLMs with consensus mechanisms (quorum, majority, unanimous)
Multi-Provider Support - OpenAI, Google Vertex AI, IBM WatsonX, Anthropic, Ollama, and 100+ providers via LiteLLM
Flexible Grading - Binary (pass/fail) and score-based (0-3 scale) grading
Tool Call Validation - Define required, optional, and prohibited tool calls
PDF Reports - Comprehensive reports with charts and breakdowns
Export Formats - YAML (source of truth), PDF, CSV, JSON
Self-Contained Outputs - Re-run evaluations from previous results
Cost Estimation - Dry-run mode to estimate costs before running evaluations
Metrics Tracking - Token usage, latency, and cost tracking per LLM call

Installation

Requires Python 3.10 or higher.

pip install rubric-kit

For development:

git clone https://github.com/your-org/rubric-kit
cd rubric-kit
pip install -e ".[dev]"

Quick Start

export OPENAI_API_KEY="your-api-key"

# Generate a rubric from Q&A
rubric-kit generate qa_input.txt rubric.yaml

# Evaluate a chat session
rubric-kit evaluate --from-chat-session chat.txt --rubric-file rubric.yaml --output-file results.yaml

# Export to PDF
rubric-kit export results.yaml --format pdf --output report.pdf

LLM Provider Setup

Rubric Kit uses LiteLLM to support 100+ LLM providers. API keys are configured via environment variables (never in config files or CLI arguments).

Some Supported Providers (most popular)

Provider	Model Format	Environment Variables
OpenAI	`gpt-4`, `gpt-4o`	`OPENAI_API_KEY`
Google AI Studio	`gemini/gemini-2.5-flash`	`GEMINI_API_KEY`
Google Vertex AI	`vertex_ai/gemini-2.5-flash`	`gcloud auth` or `GOOGLE_APPLICATION_CREDENTIALS`
IBM WatsonX	`watsonx/meta-llama/llama-3-8b-instruct`	`WATSONX_APIKEY`, `WATSONX_PROJECT_ID`
Anthropic	`claude-3-5-sonnet-20241022`	`ANTHROPIC_API_KEY`
Ollama (local)	`ollama/llama3.1`	None (uses `localhost:11434`)
Ollama (remote)	`ollama/granite4`	`OLLAMA_API_BASE`
Azure OpenAI	`azure/gpt-4`	`AZURE_API_KEY`, `AZURE_API_BASE`

📚 Full documentation: See LiteLLM Providers for complete list of supported providers, model formats, and required environment variables.

CLI Examples

# OpenAI
export OPENAI_API_KEY="sk-..."
rubric-kit generate --from-qna qna.yaml --output-file rubric.yaml --model gpt-4o

# Google AI Studio (Gemini)
export GEMINI_API_KEY="..."
rubric-kit generate --from-qna qna.yaml --output-file rubric.yaml --model gemini/gemini-2.5-flash

# Google Vertex AI
gcloud auth application-default login
rubric-kit generate --from-qna qna.yaml --output-file rubric.yaml --model vertex_ai/gemini-2.5-flash

# IBM WatsonX
export WATSONX_APIKEY="..."
export WATSONX_PROJECT_ID="..."
rubric-kit generate --from-qna qna.yaml --output-file rubric.yaml --model watsonx/meta-llama/llama-3-8b-instruct

# Local Ollama
rubric-kit generate --from-qna qna.yaml --output-file rubric.yaml --model ollama/llama3

OpenAI-Compatible Endpoints

For custom OpenAI-compatible endpoints (vLLM, LocalAI, etc.), use base_url:

judges:
  - name: custom-endpoint
    model: mistral-7b  # Model name as expected by your endpoint
    base_url: http://your-endpoint:8000/v1

Commands

Command	Description
`generate`	Create a rubric from Q&A pair or chat session
`evaluate`	Evaluate content against a rubric (outputs YAML)
`refine`	Improve an existing rubric with AI feedback
`export`	Convert YAML to PDF, CSV, or JSON
`rerun`	Re-evaluate using settings from previous output
`arena`	Compare multiple contestants against same rubric

Use rubric-kit <command> --help for detailed options.

Common Options

Flag	Commands	Description
`--dry-run`	evaluate, generate	Estimate costs without making LLM calls
`--no-metrics`	evaluate, generate, refine	Disable metrics collection in output
`--include-call-log`	evaluate	Include detailed per-call metrics in output

YAML Formats

See examples/ for complete format examples:

rubric.example.yaml - Rubric with dimensions and criteria
judge_panel.example.yaml - Multi-judge configuration
dimensions.example.yaml - Predefined dimensions
arena.example.yaml - Arena competition spec

Rubric Structure

Below you will find a very basic rubric to understand how it is composed.

TODO: For Rubrics best practices, please read the RUBRICS file.

dimensions:
  - factual_correctness: Evaluates factual accuracy
    grading_type: binary

  - quality: Evaluates response quality
    grading_type: score
    scores:
      0: Poor
      1: Fair
      2: Good
      3: Excellent

criteria:
  accuracy_check:
    category: Output
    weight: 3
    dimension: factual_correctness
    criterion: Response must correctly state X.

  tool_usage:
    category: Tools
    weight: 2
    dimension: tool_use
    tool_calls:
      respect_order: false
      required:
        - get_info:
            min_calls: 1

Judge Panel

judge_panel:
  judges:
    - name: ChatGPT-4o
      model: gpt-4o
    - name: Gemini-2.5-Flash
      model: vertex_ai/gemini-2.5-flash
    - name: Claude-4.5-Sonnet
      model: anthropic/claude-4-5-sonnet

  execution:
    mode: parallel  # sequential, parallel, batched

  consensus:
    mode: majority  # unanimous, majority, quorum
    on_no_consensus: fail  # fail, median, most_common

Output

Evaluation always produces a self-contained YAML file:

results:
  - criterion_name: accuracy_check
    result: pass
    score: 3
    reason: The response correctly stated X.

summary:
  total_score: 15
  max_score: 18
  percentage: 83.3

rubric: { ... }       # Full rubric for reference
judge_panel: { ... }  # Judge configuration used
input: { ... }        # Input content (Q&A or chat session)
metadata:
  timestamp: 2025-01-20T10:30:00
  metrics:            # LLM usage metrics (unless --no-metrics)
    summary:
      total_calls: 5
      prompt_tokens: 2500
      completion_tokens: 800
      estimated_cost_usd: 0.0425

Development

# Run tests
pytest

# Run with coverage
pytest --cov=rubric_kit --cov-report=html

# Format code
black rubric_kit tests

License

See LICENSE file.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

narmaku

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.2.0

Mar 2, 2026

0.1.6

Jan 20, 2026

0.1.5

Jan 20, 2026

0.1.4

Dec 17, 2025

0.1.3

Dec 10, 2025

0.1.2

Dec 2, 2025

0.1.0

Nov 28, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rubric_kit-0.2.0.tar.gz (153.1 kB view details)

Uploaded Mar 2, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

rubric_kit-0.2.0-py3-none-any.whl (105.4 kB view details)

Uploaded Mar 2, 2026 Python 3

File details

Details for the file rubric_kit-0.2.0.tar.gz.

File metadata

Download URL: rubric_kit-0.2.0.tar.gz
Upload date: Mar 2, 2026
Size: 153.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rubric_kit-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`8f3af808af6c86872a580bb26ff55016bfa404c0147f6271312949bf5fd63756`
MD5	`2c0249dcfafd15e392ac2fe8e52dd23b`
BLAKE2b-256	`c546dacbb528d349c06617aac5a897fe701d8354fa6331a438d5068c65b53ddc`

See more details on using hashes here.

Provenance

The following attestation bundles were made for rubric_kit-0.2.0.tar.gz:

Publisher: release.yaml on narmaku/rubric-kit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: rubric_kit-0.2.0.tar.gz
- Subject digest: 8f3af808af6c86872a580bb26ff55016bfa404c0147f6271312949bf5fd63756
- Sigstore transparency entry: 1008346505
- Sigstore integration time: Mar 2, 2026
Source repository:
- Permalink: narmaku/rubric-kit@f8d9ce39298a33cb0783dbf509e741e691262a20
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/narmaku
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yaml@f8d9ce39298a33cb0783dbf509e741e691262a20
- Trigger Event: push

File details

Details for the file rubric_kit-0.2.0-py3-none-any.whl.

File metadata

Download URL: rubric_kit-0.2.0-py3-none-any.whl
Upload date: Mar 2, 2026
Size: 105.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rubric_kit-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`94d8d77ee57e83dd8bbf0dc33bb198f84e55a1816b0002be1df1b26f1b70c827`
MD5	`7150abb44ba8ac76f31184c65fcd6668`
BLAKE2b-256	`10420dbd0174a1926144d9cd4ed25aadd6af940a856d5163b1e667a1ce0aa610`

See more details on using hashes here.

Provenance

The following attestation bundles were made for rubric_kit-0.2.0-py3-none-any.whl:

Publisher: release.yaml on narmaku/rubric-kit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: rubric_kit-0.2.0-py3-none-any.whl
- Subject digest: 94d8d77ee57e83dd8bbf0dc33bb198f84e55a1816b0002be1df1b26f1b70c827
- Sigstore transparency entry: 1008346508
- Sigstore integration time: Mar 2, 2026
Source repository:
- Permalink: narmaku/rubric-kit@f8d9ce39298a33cb0783dbf509e741e691262a20
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/narmaku
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yaml@f8d9ce39298a33cb0783dbf509e741e691262a20
- Trigger Event: push

rubric-kit 0.2.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Rubric Kit

Features

Installation

Quick Start

LLM Provider Setup

Some Supported Providers (most popular)

CLI Examples

OpenAI-Compatible Endpoints

Commands

Common Options

YAML Formats

Rubric Structure

Judge Panel

Output

Development

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance