Skip to main content

rubric

Project description

Rubrics

A Python library for LLM-based evaluation using weighted rubrics.

Installation

uv add rubric

Usage

import asyncio
import os
from openai import AsyncOpenAI
from rubric import Rubric
from autograders.per_criterion_grader import PerCriterionGrader

async def generate_with_async_openai(system_prompt: str, user_prompt: str) -> str:
    client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))
    response = await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt},
        ],
        max_tokens=400,
        temperature=0.0,
    )
    return response.choices[0].message.content or ""

async def main():
    rubric = Rubric.from_dict([
        {"weight": 1.0, "requirement": "Output includes a clear introduction"},
        {"weight": 2.0, "requirement": "Analysis is supported by specific examples"},
        {"weight": -1.0, "requirement": "Contains factual errors"}
    ])

    grader = PerCriterionGrader(generate_fn=generate_with_async_openai)
    
    result = await rubric.grade(
        to_grade="Your text to evaluate...",
        autograder=grader
    )

    print(f"Score: {result.score}/100")
    for criterion in result.report:
        print(f"  {criterion.verdict}: {criterion.requirement}")

asyncio.run(main())

Autograder Strategies

  • PerCriterionGrader - Evaluates each criterion in parallel LLM calls
  • PerCriterionOneShotGrader - Evaluates all criteria in a single LLM call
  • RubricAsJudgeGrader - Holistic evaluation, LLM returns final score directly

Customization

You can customize grading at multiple levels:

1. Custom generate_fn (most common) Pass any function that takes (system_prompt, user_prompt) and returns a string. Use any LLM provider (OpenAI, Anthropic, local models, etc.):

grader = PerCriterionGrader(generate_fn=your_custom_function)

2. Override specific methods Subclass any autograder and override:

  • judge() - How rubric criteria are evaluated against the text
  • generate() - How prompts are constructed and LLM is called, typically calls the generate_fn
  • aggregation() - How individual criterion scores are combined

3. Full control Override the entire grade() method for complete end-to-end control over the grading process.

Loading Rubrics

rubric = Rubric.from_dict([...])
rubric = Rubric.from_json('{"criteria": [...]}')
rubric = Rubric.from_yaml('...')
rubric = Rubric.from_file('rubric.yaml')

Requirements

  • Python 3.13+
  • An LLM API (e.g., OpenAI, OpenRouter) - set appropriate API keys as environment variables

License

MIT License - see LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rubric-1.1.2.tar.gz (4.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rubric-1.1.2-py3-none-any.whl (5.2 kB view details)

Uploaded Python 3

File details

Details for the file rubric-1.1.2.tar.gz.

File metadata

  • Download URL: rubric-1.1.2.tar.gz
  • Upload date:
  • Size: 4.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.17

File hashes

Hashes for rubric-1.1.2.tar.gz
Algorithm Hash digest
SHA256 e1bbe2484b3326745a02efc642f4d813beb00a07972ed9d6207eb0431dbb59cd
MD5 c2dd267f5aa116e55a5840be7b655154
BLAKE2b-256 b7f28f5eae8bfb7f63e4733c731747fcde1b6c2dcee66cfc5b08f2e2303a7413

See more details on using hashes here.

File details

Details for the file rubric-1.1.2-py3-none-any.whl.

File metadata

  • Download URL: rubric-1.1.2-py3-none-any.whl
  • Upload date:
  • Size: 5.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.17

File hashes

Hashes for rubric-1.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 c2bfe496a37fdf6664885c22186fd22bc74a4f100d88748e5d12a7d1ee3c7ab5
MD5 1bb92cd6f8db21bf13e059b5aad8d6ef
BLAKE2b-256 0cc95b5eb4ed9350337f907bf4d4caca6da51f54020592ad74f613f5a8831c59

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page